mlprep
mlprep/ML Breadthhard12 min

What does it mean for a model to be calibrated? How would you detect miscalibration, and what techniques exist to fix it?

formulate your answer, then —

tldr

Calibration: predicted probability p means "p% of these cases are actually positive." Detect with reliability diagrams (plot predicted vs empirical) and ECE. Fix with: Platt scaling (logistic regression on scores, simple), isotonic regression (flexible, needs data), temperature scaling (best for neural nets, single parameter). Always calibrate on held-out data. Calibration matters when probabilities drive decisions; AUC-only tasks don't need it.

follow-up

  • A random forest achieves 0.93 AUC but terrible calibration. Would you bother calibrating it? Depends on what?
  • How does temperature scaling differ from Platt scaling, and why is temperature scaling preferred for large neural networks?
  • You calibrated your model on a validation set. Three months later, the model is deployed in production. Do you expect calibration to hold?