What is contrastive loss and triplet loss? When would you use metric learning approaches over cross-entropy classification?
formulate your answer, then —
tldr
Contrastive loss: pull similar pairs together, push dissimilar pairs past a margin. Triplet loss: anchor must be closer to positive than negative by margin m — trains relative distance. Both suffer from easy-pair collapse; hard/semi-hard mining is essential. Modern self-supervised methods (SimCLR, CLIP) use InfoNCE which treats entire batch as negatives, avoiding explicit mining. Use metric learning when classes are open-set or you need embedding geometry; use cross-entropy for fixed class sets.
follow-up
- What is the role of the temperature τ in InfoNCE/NT-Xent and how does it affect what the model learns?
- How does CLIP's training objective differ from standard supervised contrastive learning?
- If you had to build a product image similarity system at scale, would you use triplet loss or something else? Walk me through your design.