Chapter 13 · Hallucinations · 9 min

Why LLMs make things up

Calibration, confident falsehoods, countermeasures. The structural mechanism behind the most common criticism — and what we can actually do about it.

The most common criticism

You ask an LLM a question on a niche topic. It answers confidently: a bibliographic reference, a date, a quote. You check. The book doesn't exist. The date is wrong. The quote was never said.

This phenomenon has an official name — hallucination — and it's probably the first thing people criticize about LLMs. Not an occasional bug: a structural property. To understand why, you have to come back to how the model was trained.

Three mechanisms that combine

1. Cross-entropy doesn't reward uncertainty. During pre-training (chapter 06), the objective is to minimize the log-probability of the correct token. At no point does the model learn to say "I don't know": it learns to always predict something, as plausibly as possible. If the right answer isn't in its weights, it produces the most likely-sounding string, not an admission of ignorance.

2. RLHF rewards confidence over honesty. During alignment (chapter 08), humans rank responses. On average, they prefer a confident, well-phrased answer to "I'm not sure". The reward model learns this bias, and the LLM learns to sound sure — even when it isn't.

3. No internal verification loop. A human inventing a quote pauses, doubts, checks. An LLM generating token by token has no such mechanism. It moves forward, no external check, and each token spawns the next under the same plausibility logic.

A hallucination isn't a bug. It's what happens when a system trained to always produce plausible text meets a question whose answer isn't in its weights.

The calibration problem

A well-calibrated model is one whose stated confidence matches the probability of being correct. If it says "I'm 80% sure", it should be right about 80% of the time.

Raw LLMs (before RLHF) are surprisingly well calibrated on their internal probabilities. But alignment decalibrates the model: by rewarding confidence, it pulls the model away from the statistical truth of its own predictions.

That's what explains the "confidently hallucinating" mode: it's not that the model doesn't know it doesn't know. It's that training pushed it to mask that uncertainty.

The model assigns a probability to every claim it makes. A wrong but coherent statement often gets a high score: that's the structural mechanism behind hallucinations, not a one-off bug a patch can fix.

Play with a few questions. Watch how stated confidence and actual probability of being correct don't always follow the same curve. Toggle + RAG or + extended reasoning to see how countermeasures shrink the gap.

Four families of countermeasures

Hallucinations don't go away with better alignment. They're structural. To reduce them in practice, you need systemic levers, not just a better model.

1. Connect the model to tools (chapter 11)

The rule: anything an LLM does badly, delegate to a deterministic system. Compute a derivative? Code interpreter. Get a stock quote? API. Check that a file exists? File system tool. The model no longer tries to guess the result — it observes it.

Effect: hallucinations on domains covered by tools drop to zero. Hallucinations on other domains stay.

2. RAG (chapter 10)

Instead of asking the model what it remembers about a topic, you give it reliable sources at answer time. Bibliographic and factual hallucinations decrease sharply, because the model can quote what it reads, not just what it imagines.

Limit: if the sources are bad or poorly retrieved, the model hallucinates on their content. And even with good sources, it can over-extrapolate ("the source says X, therefore necessarily Y").

3. Extended reasoning (chapter 17)

A model that takes time to verify its own draft before answering makes fewer mistakes. Reasoning models (o1, o3, Claude extended thinking) generate an invisible chain of thought in which they can recompute, contradict a step, take another path.

It's imperfect — a model can hallucinate inside its reasoning too — but the simple act of unfolding the steps catches a significant portion of errors.

4. Explicit fine-tuning on uncertainty

The most promising research angle: train the model to abstain. You show it (question, answer) pairs where, when internal probability is low, the right answer is "I don't know" or "I don't have that information". The model learns to recognize its own confidence level and communicate it.

Several labs are working on this (DeepMind, Anthropic). It's still far from robust, but it's the only technique that really attacks the root of the problem.

Detecting a hallucination in practice

A few useful heuristics on the user side:

  • Ask for sources. If the model can't cite its sources, or invents them, treat the answer as suspect.
  • Check what's specific. Proper nouns, dates, exact figures, quotes are the risk zones. General content is usually OK.
  • Rephrase the question differently. A model inventing things often gives consistent answers to the same question rephrased — but inconsistent answers to very different rephrasings.
  • Ask the model its confidence level. Imperfect, but correlated with actual answer quality, especially in recent models.
  • Cross-check with another model. Hallucinations are rarely the same across models. An answer where GPT-4 and Claude converge is much more likely to be correct.

What to take away

Three things.

One. Hallucinations aren't a model defect: they're the consequence of its training objective. No surface-level fine-tuning makes them disappear.

Two. The countermeasures that work in production are systemic (RAG, tools, reasoning, abstention). None is perfect alone; combined, they bring the hallucination rate down to acceptable levels for most cases.

Three. For the end user, the best defense is not to trust blindly, especially on specific details (sources, dates, figures). An LLM answering you confidently isn't proof it's right.

Asking an LLM "are you sure?" isn't a verification. It's just another generation of plausible text.

Updated

Why LLMs hallucinate (and how to mitigate it) · Step by Token