A model you deployed three months ago is showing declining performance. Walk me through how you'd set up monitoring to detect this proactively, distinguish between data drift and concept drift, and decide when to trigger retraining.
formulate your answer, then —
tldr
Monitor three layers: input features (PSI/KS for drift), model predictions (score distribution), and business outcomes (ground-truth labels, lagged). Distinguish data drift (feature distributions change) from concept drift (the label relationship changes) — they have different root causes and fixes. Instrument your prediction logs from day one; retrofitting observability is expensive.
follow-up
- How would you handle monitoring for a model with 48-hour delayed labels, like a churn model?
- What's the difference between retraining from scratch vs. fine-tuning on recent data? When would you choose each?
- How would you set up shadow scoring — running both an old and new model in parallel — without impacting production latency?