mlprep
mlprep/ML Breadthmedium10 min

What is regularization and how do the different techniques compare? When would you pick one over another?

formulate your answer, then —

You said L1 drives weights to exactly zero but L2 doesn't. Intuitively, why does the math produce that difference?

formulate your answer, then —

tldr

Regularization constrains model capacity to reduce overfitting. L2 shrinks weights proportionally (never to zero). L1 shrinks by a constant (pushes small weights to exactly zero — implicit feature selection). Dropout forces redundant representations by randomly silencing neurons. Batch norm stabilizes training distributions. In practice: L2/AdamW by default, L1 when sparsity matters, dropout in fully connected layers.

follow-up

  • How does early stopping relate to L2 regularization, and in what sense are they equivalent?
  • Why does dropout not work well with batch normalization, and how does layer normalization compare?
  • How would you diagnose whether a model is underfitting vs. overfitting and choose the right regularization response?