RAG vs fine-tuning vs prompting: when to use what

RAG, fine-tuning and prompting are constantly described as alternatives. They’re not. They do different jobs. Using the wrong one is the single most common reason AI projects over-invest and underdeliver.

Here’s the shortest correct summary of when to reach for each.

Prompting: for behavior

Prompting (including system prompts, few-shot examples, and templated instructions) is how you shape behavior — tone, format, decision procedure, refusal policy. It’s where you teach the model how to think about this kind of request, not what facts to use.

Use prompting when: the behavior is general-purpose, when it changes monthly or more, when you need zero deployment lag, and when you want transparency about the rules the model is following. Almost every production system relies on well-crafted prompts as its foundation.

RAG: for knowledge that changes

Retrieval-Augmented Generation is how you give the model current, specific knowledge at inference time. Your product catalog, your docs, your tickets, your policies. Things that change, things that differ per user, things that are too large to fit in a prompt.

Use RAG when: the information is specific to your organization, when it changes weekly or more, when you need citations back to source, when you need per-user or per-tenant filtering, and when the corpus is larger than a few thousand tokens.

Common mistake: using RAG for behavior. Retrieving five example emails to teach the model to write like your brand almost always works worse than writing a clear style guide in the system prompt.

Fine-tuning: for patterns and style that don’t change

Fine-tuning changes the model’s weights to internalize a pattern. It’s slow to iterate, expensive to maintain across model updates, and it can’t teach the model facts reliably.

Use fine-tuning when: the pattern is stable over months, when prompting is not enough (you’ve tried, hard), when you have 500+ high-quality examples, when latency or cost on a smaller fine-tuned model beats prompting on a larger one, and when the task is narrow (classification, extraction, format transformation, strict style imitation).

Common mistake: fine-tuning to inject knowledge. The model will mostly memorize surface patterns and still hallucinate specifics. Use RAG.

The decision tree

Ask these questions in order:

Does the information change week-to-week? If yes → RAG. Move on.
Is it a behavior/policy/format rule? Start with prompting. 80% of the time, that’s the answer.
Is prompting not enough despite real effort? Reach for few-shot, then structured outputs, then (only then) fine-tuning.
Is this a narrow, repeated, stable task where cost/latency matters? Fine-tuning might earn its keep. Benchmark against a cheaper prompted model first.

Why this matters commercially

Fine-tuning is the most expensive, slowest, and most exciting option — which is why it’s proposed too often. RAG is the second most expensive and second-most proposed. Prompting is the cheapest, most flexible, and underproposed. A disciplined team reaches for prompting first and only escalates when measurement demands it.

On every engagement we’ve run in the last year, we have ended up with a system that is 70–90% prompting, 10–25% RAG, and 0–5% fine-tuning. That ratio is roughly correct for most B2B use cases today.

Prompting: for behavior

RAG: for knowledge that changes

Fine-tuning: for patterns and style that don’t change

The decision tree

Why this matters commercially

Want this in your business?

Why your AI demo won’t survive in production

Cutting your LLM bill by 60%