Walk me through how you'd deploy a new model version to production safely. What strategies exist and how do you choose between them?
You mentioned shadow mode — but you said the model's output is discarded. How do you actually evaluate whether the shadow model is better if users never see it?
tldr
Safe model deployment is about limiting blast radius while gathering live evidence. Shadow mode gives you production-distribution predictions without user risk — useful for validating prediction distributions and gathering labeled data. Canary progressive rollout is the workhorse for most deployments. A/B testing adds statistical rigor when you're measuring business outcomes. Shadow mode can't capture feedback-loop effects, so you need a canary to close that gap.
follow-up
- How do you handle the case where your model's predictions drive downstream features — how do you break that feedback loop during evaluation?
- What's the rollback strategy if a model has been in production for two weeks and you've already retrained downstream models on its outputs?
- How would you design a deployment system for a model that serves 100k requests per second?