You train a model daily on logged features. At serving time the same features are computed in real-time. What can go wrong, and how do you catch it before it hurts users?
formulate your answer, then —
tldr
Train-serve skew: feature values differ between training and serving due to duplicated logic, time window misalignment, late events, or different join semantics — not because the world changed. Detection: log serving features, compare distributions to training features. Prevention: feature store with shared code path for training retrieval and serving lookup; point-in-time correctness prevents future leakage. Shadow validation + distribution monitoring to catch regression.
follow-up
- What is point-in-time correctness in a feature store and why is it essential for preventing data leakage?
- How would you detect train-serve skew automatically in a production ML system?
- Your model's performance degrades two weeks after launch but training metrics still look fine. How do you distinguish train-serve skew from concept drift?