Insight · April 7, 2026 · 4 min read

Finetuning Checklist

When to finetune vs. when to use RAG

Fine-tuning vs RAG

A practical decision framework. Check every statement that's true for your use case — whichever side has more checks is probably your starting point.

Signs you need fine-tuning

Fine-tuning changes how the model behaves. You're updating the model's weights to shift its default patterns — tone, structure, reasoning style, or task-specific behavior. Think of it as teaching the model a new habit.

\[ \] You need a specific tone, style, or persona the base model can't match with prompting alone (e.g. matching your brand voice, writing like a radiologist, outputting in a domain-specific format)
\[ \] You're doing a narrow, well-defined task like classification, entity extraction, or structured output where you can clearly define "correct"
\[ \] You have hundreds+ of high-quality labeled input/output examples — and you're confident in their quality, because the model will absorb their mistakes too
\[ \] Latency matters — you can't afford the extra retrieval round-trip that RAG adds (embedding lookup + context stuffing + longer prompt)
\[ \] The knowledge you need is behavioral (how to respond), not factual (what to say) — e.g. "always respond in bullet points with confidence scores" vs "what's our return policy"
\[ \] Prompt engineering and few-shot examples have plateaued — you've maxed out what you can fit in the context window and the model still isn't consistent enough
\[ \] You want to reduce token costs by replacing long system prompts and few-shot examples with learned behavior
\[ \] You need the model to reliably follow a complex output schema (JSON, XML, function calls) without constant prompt reminders

Watch out for these fine-tuning traps:

Fine-tuning on factual knowledge leads to hallucination — the model "sort of" learns the facts but confidently drifts from them. Use RAG for facts.
Small or noisy training sets teach the model your errors, not your intent.
Fine-tuned models freeze at the moment of training. If your knowledge changes monthly, you're retraining monthly.

Signs you need RAG

RAG changes what the model knows at inference time. You're injecting relevant context into the prompt so the model can reason over fresh, grounded information without any weight updates. Think of it as giving the model an open-book exam.

\[ \] Your knowledge base changes frequently — docs, policies, product info, pricing, inventory — and you can't retrain every time something updates
\[ \] You need the model to cite sources or show where answers came from, which is critical for trust, compliance, or auditability
\[ \] You don't have enough labeled data for fine-tuning, but you do have a corpus of documents the model should reference
\[ \] Accuracy on factual recall matters more than style — you'd rather the model be correct and a little verbose than stylistically perfect and wrong
\[ \] You need to ground responses in private or proprietary documents that weren't in the base model's training data
\[ \] You want to improve the model without retraining — just update the document index and the model immediately has access to new knowledge
\[ \] Your use case requires long-tail knowledge across a large corpus (thousands of pages) that can't fit in a single prompt
\[ \] You need per-query access control — different users should see answers grounded in different document sets

Watch out for these RAG traps:

Retrieval quality is your ceiling. If your chunking strategy or embedding model returns irrelevant context, the LLM will confidently use bad context.
RAG adds latency (embedding query → vector search → rerank → stuff context → generate) and cost (longer prompts = more tokens).
RAG doesn't fix behavior problems. If the model's tone is wrong or it ignores instructions, more context won't help.

When to combine both

The best production systems often use both. Fine-tune for behavior, RAG for knowledge.

\[ \] You need a domain-specific voice AND up-to-date factual grounding (e.g. a medical assistant that speaks like a clinician and references current guidelines)
\[ \] You've fine-tuned for output format and task structure, but the model needs access to a live knowledge base at inference time
\[ \] You want to reduce prompt size (fine-tuning replaces few-shot examples) while still injecting relevant documents per query (RAG)

Reading your results

More fine-tuning checks? You're optimizing for behavior and style. Invest in a high-quality training set and eval pipeline.
More RAG checks? Your problem is about knowledge. Invest in chunking strategy, embedding quality, and retrieval evaluation.
Tied? Combine both. Fine-tune a base model for your task's behavior, then plug in RAG for factual grounding at inference time.

Read this article as markdown