How do you handle class imbalance in ML?

Question

Accepted Answer

You're building a fraud detection model. Only 0.1% of transactions are fraud. How do you handle this class imbalance? Think about: what a model that always predicts "not fraud" achieves in accuracy. Why accuracy is the wrong metric here. What oversampling vs undersampling actually do to the decision boundary. Why resampling affects calibration. Whether you should change the data, the loss, or the threshold. **First: recognize what the problem actually is** A model that always predicts "not fraud