Module 7 — Sampling & Generation
Your trained model does not output words. At every step it outputs a probability for every possible next token — a whole distribution. Turning that distribution into one actual character is a separate decision called sampling, and it is where a model goes from sounding robotic to sounding creative. Same model, same weights; only the sampling rule changed. This module is the knobs: temperature, top-k, and top-p.
The choice the model leaves you
Suppose the next-token distribution is: the 60%, a 25%, that 10%, one 5%. You could always take the top one (greedy) — safe but repetitive. Or you could roll a weighted die and let the long tail occasionally win — varied but riskier. Every generation knob is just a way of reshaping this distribution before the die is rolled.
This activity needs JavaScript. The lesson below still covers everything.
The three knobs
- Temperature — divides the scores before softmax. Low (→0) sharpens toward greedy; high (>1) flattens toward random. It rescales confidence.
- Top-k — keep only the k most likely tokens, zero the rest, renormalize. A hard cap on how far into the tail you will ever reach.
- Top-p (nucleus) — keep the smallest set of tokens whose probability adds up to p (say 0.9), drop the rest. The cutoff adapts: few tokens when the model is confident, more when it is unsure.
See them reshape the distribution
Same starting distribution, your knobs. Watch which tokens survive and how the bars rescale — then sample to see the text it would produce.
This activity needs JavaScript.
logits = model(context)[-1] # scores for the next token probs = softmax(logits / temperature) # temperature reshapes confidence probs = top_k(probs, k=40) # keep the 40 most likely probs = top_p(probs, p=0.9) # …then the nucleus inside that next = sample(probs) # roll the weighted die
These are the exact arguments you set on the OpenAI or Anthropic API: temperature, top_p, top_k. They do not change the model — only how its distribution is collapsed into one token.
Check your understanding
A few questions about sampling. You will get a score.
This activity needs JavaScript.