mlprep
mlprep/ML Breadthhard12 min

When is reaching for a large foundation model the wrong choice? Walk me through how you'd decide between a logistic regression, a gradient boosted tree, a fine-tuned BERT, and GPT-4 for a new ML task.

formulate your answer, then —

tldr

Reach for simpler models first: GBDT for tabular data, fine-tuned BERT for NLP classification, logistic regression for latency-critical scoring. Foundation models are justified when: task is open-ended or generative, labeled data is scarce, you need world knowledge or complex reasoning, and volume is low enough to absorb cost. At 10M/day, GPT-4 costs 1000× more than a small fine-tuned model. Distillation bridges the gap: label with a large model, serve with a small one. Evaluate organizational maturity to maintain what you deploy.

follow-up

  • How would you use knowledge distillation to compress a GPT-4-level capability into a BERT-scale model for a classification task?
  • Your team wants to use RAG for a customer support bot. How do you decide if RAG is better than fine-tuning, and how would you evaluate either?
  • A PM asks why you're not just using GPT-4 for everything. Walk me through the cost and reliability argument without dismissing the capability.