Design the ML system that detects harmful content on YouTube — videos that violate policy (violence, hate speech, CSAM, misinformation). YouTube receives 500 hours of video per minute. Walk me through how you'd detect violations at that scale, how you'd handle the human review pipeline, and how you'd measure whether the system is working.
You mentioned misinformation as a violation category — but misinformation is much harder to classify than graphic violence. A video claiming a vaccine causes harm might be false, true, contextually misleading, or legitimate medical debate. How do you build an ML system for something that requires nuanced factual judgment?
Design the human review and appeals system. How do reviewers interact with ML, and how do appeal outcomes improve the system?
How do you handle adversarial creators who adapt once they learn what the system catches?
tldr
Harmful content design is a policy-routing system, not one classifier: multimodal extraction, policy-specific risk scores, action bands, human review, appeals, post-publish monitoring, and adversarial response. Calibrate precision/recall by severity. For misinformation, ML routes and matches known claims; humans and domain experts handle novel or contested judgments. Reviewer operations, appeals, and evasion response are part of the core ML system.
follow-up
- How would you design the appeals system — when a creator disputes a removal, how does ML assist the human reviewer, and how do you use appeal outcomes to improve the classifier?
- Coordinated inauthentic behavior (bot networks artificially boosting content) is often harder to detect than the content itself. How would you approach detecting it?
- How do you handle the cold start problem for a brand new channel — you have no history to assess, and bad actors specifically create new channels to avoid history-based signals?