What problem does DSPy solve that manual prompting doesn't? How does its compilation step work, and when would you use DSPy over fine-tuning?
formulate your answer, then —
tldr
DSPy replaces manual prompt engineering with program compilation: you define pipeline structure and a metric, and an optimizer searches over few-shot examples and instruction variants. It is useful when fine-tuning is impractical, the pipeline changes often, or labeled data is limited. Quality depends entirely on your metric and dev set. Fine-tuning beats DSPy when data is plentiful and latency matters.
follow-up
- How would you design a metric function for DSPy to optimize a RAG pipeline for a customer support use case?
- When does BootstrapFewShot outperform MIPRO, and what extra information does MIPRO use?
- You compiled a DSPy program last month. The retrieval index changed. Do you need to recompile?