Index

All chapters

From tokenization to alignment. Each chapter includes at least one interactive visualization.

IAnatomy of a Model
  1. 01

    Foundations

    Predicting one word at a time

    What is a language model? Why predicting the next word is enough to make intelligence emerge.

    6 min
  2. 02

    Tokenization

    From text to tokens

    How text becomes numbers. BPE, subwords, and why LLMs struggle to count letters.

    8 min
  3. 03

    Embeddings

    The space of meaning

    Words in a geometric space. King − Man + Woman = Queen, and other vector miracles.

    10 min
  4. 04

    Attention

    Attention is all you need

    The mechanism that changes everything. How each token looks at all others to understand context.

    12 min
  5. 05

    Architecture

    The Transformer, in full

    Putting the pieces together: multi-head attention, feed-forward, normalization, residual connections.

    14 min
IITraining and Alignment
  1. 06

    Training

    How it learns

    Loss, gradient descent, backpropagation. And why billions of parameters are needed.

    10 min
  2. 07

    Generation

    Choosing the next word

    Temperature, top-k, top-p. The art of turning a probability distribution into text.

    7 min
  3. 08

    Alignment

    From raw model to assistant

    Fine-tuning, RLHF, constitutional AI. How we make an LLM useful and harmless.

    9 min
IIIThe Model in Production
  1. 09

    Context

    What the model remembers

    The context window: perfect but bounded memory. Why ChatGPT forgets and what it costs.

    8 min
  2. 10

    RAG

    Reading your documents

    How an LLM accesses thousands of pages without memorizing them. Embeddings, semantic search, injected context.

    9 min
  3. 11

    Agents

    From model that replies to model that acts

    Tool use, ReAct loop, multi-step tasks. How an LLM becomes an agent capable of acting in the world.

    10 min
  4. 12

    Prompting

    The art of talking to a LLM

    Zero-shot, few-shot, chain-of-thought, self-consistency. Why prompt wording radically changes what a model produces.

    8 min
  5. 13

    Hallucinations

    Why LLMs make things up

    Calibration, confident falsehoods, countermeasures. The structural mechanism behind the most common criticism — and what we can actually do about it.

    9 min
IVGoing Further
  1. 14

    Fine-tuning

    Specializing a model without retraining everything

    LoRA, QLoRA, SFT. How to adapt a generalist model to a specific domain by training 0.1% of its parameters.

    9 min
  2. 15

    Multimodality

    When the model reads images

    Patch embedding, ViT, CLIP. How a text Transformer becomes multimodal by treating an image as a grid of tokens.

    8 min
  3. 16

    Evaluation

    How do we know a model is better?

    MMLU, HumanEval, LMSYS Arena. Why measuring LLM intelligence is hard — and why no single benchmark is enough.

    8 min
  4. 17

    Reasoning

    Think before you answer

    Thinking tokens, extended reasoning, thinking budgets. How o1/o3-class models generate a hidden chain of thought before responding.

    9 min
  5. 18

    Inference

    Why the 2nd token is faster than the 1st

    The KV cache and autoregressive generation. Prefill vs decode, TTFT, and why the cache changes everything.

    8 min
  6. 19

    Scaling

    Bigger, always better?

    Kaplan and Chinchilla scaling laws. Why GPT-3 was undertrained, and the optimal 20-tokens-per-parameter ratio.

    9 min
  7. 20

    Interpretability

    What's really going on inside?

    Circuits, polysemantic neurons, Sparse Autoencoders. How Anthropic and DeepMind are opening the black box.

    9 min
  8. 21

    Diffusion

    Generate an image by erasing noise

    Stable Diffusion, DALL-E, Midjourney. The reverse denoising process, the role of CLIP, and why U-Net is giving way to Transformers.

    9 min
All chapters · Step by Token