mlprep
mlprep/ML Breadthmedium12 min

Explain data augmentation as regularization. How do you decide which augmentations are valid for a task?

formulate your answer, then —

tldr

Data augmentation improves generalization by teaching label-preserving invariances. It helps when transformations reflect real deployment variation. It hurts when transformations change the label, remove key evidence, or create unrealistic training examples.

follow-up

  • Why is augmentation easier in vision than in NLP?
  • How would you validate that an augmentation policy is safe?
  • What are Mixup and CutMix, and why can they improve robustness?