You have a ranking task with 500 tabular features and 200M training examples. When would you pick GBDT over a deep neural network, and what signals would make you switch?
formulate your answer, then —
tldr
GBDT wins on heterogeneous tabular data: handles missing values natively, discovers feature interactions, fast training, interpretable. Switch to DNN when you have high-cardinality ID features (need embeddings), sequences, or content (text/images). At 200M examples: start with LightGBM for speed and iteration velocity, add DNN layers when ID embeddings or content features are needed. Production systems often use both (Wide & Deep, GBDT+embedding stacking).
follow-up
- What is the Wide & Deep architecture and what problem does each component solve?
- How would you combine GBDT and DNN predictions — ensemble, stacking, or something else?
- LightGBM trains faster than XGBoost on large datasets. What algorithmic difference causes this?