Fine-tune, retrieve, or prompt: a decision tree.
Nine out of ten clients arrive convinced they need to fine-tune a model. Nine out of ten do not. A practical decision tree for when each approach earns its keep.
There is no best approach. There is only the right approach for the kind of gap you are closing. The question is not "fine-tune or RAG" — the question is what you are trying to teach the model, and then which tool is built for that lesson.
The three kinds of gap
Every LLM feature fails in one of three ways. Knowing which kind of failure you have tells you which tool to reach for.
Knowledge gap — the model does not know your facts.
Your policy is three weeks old. The product doc is proprietary. The customer's address is in a CRM the model has never seen. Fine-tuning will not help; the model will forget it, confuse it with its training data, and hallucinate confidently. Retrieval is built for this. Build a RAG pipeline.
Format gap — the model knows the answer but writes it wrong.
You want outputs in your specific JSON schema. You want tone-matched copy. You want the model to always end with a particular CTA. Prompt first — system message, a few-shot examples, structured output. Only escalate to fine-tuning if prompting still underperforms after an honest attempt, which usually it does not.
Skill gap — the model cannot do the task at all.
You are classifying something subtle, scoring something nuanced, translating in a domain where the off-the-shelf model is measurably bad. This is where fine-tuning earns its keep. You have a labelled dataset, you have a clear metric, you have evidence that prompting has topped out. Fine-tune — and re-evaluate every time a new base model ships.
The decision tree
- 01Can a good prompt plus a few-shot example get you to acceptable quality? Ship that. Done.
- 02Does the model need facts it does not have? Add retrieval. Re-evaluate.
- 03Does the model still get the format wrong? Structured outputs, JSON mode, or constrained decoding.
- 04Is the task itself beyond the base model? Now fine-tune — and only the smallest model that will do the job, not the largest.
“Fine-tuning is a commitment. You own the model, the dataset, the eval, the drift. Take it on when the economics justify it — not when a vendor's marketing does.”
The shelf-life question
Every frontier release shrinks the fine-tuning use case. Things that required fine-tuning in 2023 are prompt-engineerable in 2026. Before you fine-tune, ask: will the next base model make this unnecessary? If the answer is probably yes within twelve months, don't.
- Fine-tuning
- RAG
- Prompting
- LLM