Walk me through precision, recall, F1, AUC-ROC, and AUC-PR. When would you use each, and what does each actually measure?
formulate your answer, then —
tldr
Precision = how much FP cost you. Recall = how much FN cost you. F1 = harmonic mean, punishes imbalance between them. AUC-ROC measures rank quality; random = 0.5, but misleadingly high under class imbalance. AUC-PR better for imbalanced problems — random = prevalence. Always pick threshold based on business cost, not 0.5.
follow-up
- Your medical screening model has 95% recall but only 10% precision. Is this acceptable? How do you explain the trade-off to a non-technical stakeholder?
- How do you compare two models when one has higher AUC-ROC but the other has higher AUC-PR?
- What is the Matthews Correlation Coefficient (MCC) and when is it preferred over F1?