mlprep
mlprep/ML Breadthmedium10 min

How does a decision tree decide where to split? What's the difference between Gini impurity and information gain, and which should you use?

formulate your answer, then —

tldr

Decision trees split greedily to maximize impurity reduction. Gini impurity = 1 - ∑p²_c (probability of mislabeling). Information gain = entropy reduction (uses log). Both produce nearly identical trees in practice; Gini is faster. Regression trees minimize variance. Prevent overfitting via max_depth, min_samples_leaf, or cost-complexity pruning. Trees are high-variance weak learners — power comes from ensembling (bagging, boosting).

follow-up

  • What is cost-complexity pruning and how does it differ from setting max_depth?
  • How does a decision tree handle numeric vs categorical features differently?
  • Why do decision trees partition the feature space into axis-aligned rectangles? What does this mean for modeling circular or diagonal decision boundaries?