mlprep
mlprep/ML Breadthmedium12 min

Walk me through how convolutional neural networks work. What does a convolution actually compute, and why does the architecture make sense for images?

formulate your answer, then —

You mentioned weight sharing enforces translation equivariance — what's the difference between equivariance and invariance, and where does that distinction matter in practice?

formulate your answer, then —

tldr

CNNs exploit locality and translation structure in images. A convolution applies a learned filter at every spatial position (weight sharing), producing a feature map that tracks where patterns appear. Stacked layers build from low-level edges to high-level concepts. CNNs are translation-equivariant (features track position) and approximately invariant after pooling. Standard CNNs aren't rotation or scale invariant — data augmentation is the practical fix.

follow-up

  • How does a 1×1 convolution differ from a standard convolution and when is it useful?
  • What is batch normalization doing in a CNN and why does it usually go between the convolution and the activation?
  • How would you adapt a classification CNN backbone for a dense prediction task like semantic segmentation?