Walk me through how modern object detection works. How would you frame the space of approaches?
formulate your answer, then —
You mentioned NMS to deduplicate detections. Can you walk through exactly how non-maximum suppression works and where it breaks down?
formulate your answer, then —
tldr
Object detection = bounding box regression + classification, done simultaneously. Two-stage detectors (Faster R-CNN) propose regions then classify; one-stage (YOLO) predict directly from a grid. Anchor boxes define the space of candidate shapes the model refines. NMS deduplicates predictions greedily by confidence + IoU; it fails when distinct objects overlap.
follow-up
- How would you adapt an object detection pipeline to detect very small objects in high-resolution satellite imagery?
- What is the precision-recall tradeoff in detection and how does mAP summarize it?
- How does DETR's approach eliminate the need for anchor boxes and NMS, and what does it trade off?