mlprep
mlprep/ML Breadthhard12 min

Explain label smoothing and knowledge distillation. Why can softer targets improve generalization, and what are the tradeoffs?

formulate your answer, then —

tldr

Label smoothing regularizes one-hot targets by reducing overconfidence. Distillation trains a student on a teacher's soft probabilities, transferring similarity structure and enabling cheaper serving. Both can improve generalization, but they can also hurt calibration or transfer teacher bias.

follow-up

  • Why does temperature help in distillation?
  • When can label smoothing hurt calibration?
  • How would you validate a distilled model before replacing a larger teacher in production?