When is reaching for a large foundation model the wrong choice? Walk me through how you'd decide between a logistic regression, a gradient boosted tree, a fine-tuned BERT, and GPT-4 for a new ML task.
formulate your answer, then —
tldr
Reach for simpler models first: GBDT for tabular data, fine-tuned BERT for NLP classification, logistic regression for latency-critical scoring. Foundation models are justified when: task is open-ended or generative, labeled data is scarce, you need world knowledge or complex reasoning, and volume is low enough to absorb cost. At 10M/day, GPT-4 costs 1000× more than a small fine-tuned model. Distillation bridges the gap: label with a large model, serve with a small one. Evaluate organizational maturity to maintain what you deploy.
follow-up
- How would you use knowledge distillation to compress a GPT-4-level capability into a BERT-scale model for a classification task?
- Your team wants to use RAG for a customer support bot. How do you decide if RAG is better than fine-tuning, and how would you evaluate either?
- A PM asks why you're not just using GPT-4 for everything. Walk me through the cost and reliability argument without dismissing the capability.