Module 3 — Attention, Intuitively
Module 1's bigram model had a fatal flaw: it remembered only the single previous word. Real sentences have dependencies that reach much further back. To predict the last word of "the keys to the cabinet were…", you must look past "cabinet" all the way back to "keys" to know it is "were," not "was." The mechanism that lets a model reach back and weigh earlier words is attention — the single idea behind every modern LLM.
The core question attention answers
When predicting the next token, attention asks: of all the earlier tokens, how much should each one matter right now? It assigns every previous token a weight between 0 and 1, and the weights add up to 1. A high weight means "this word is highly relevant to what I'm about to predict"; a near-zero weight means "ignore this one for now."
See where the model should look
Below, the model is predicting the highlighted blank. First decide for yourself which earlier word the prediction most depends on — then reveal the attention weights and see if you agree.
This activity needs JavaScript. The lesson below still covers everything.
Sharp vs. diffuse attention
Attention is not all-or-nothing. The same relevance scores can produce a sharp focus on one word or a diffuse spread across many — controlled by how decisively the scores are turned into weights (a softmax, the same function from Course 4). Drag the focus dial and watch the weights concentrate or spread, and watch the blended "context" the model carries forward change with them.
# a relevance score for each earlier token, then softmax to get weights scores = query @ keys.T # how well each token matches what we need weights = softmax(scores) # positive, sum to 1 — the bars you saw context = weights @ values # a weighted blend of the earlier tokens
That is the whole idea. Module 4 opens up exactly where query, keys, and values come from — but the move is always: score, soften into weights, blend.
Check your understanding
A few questions about attention. You will get a score.
This activity needs JavaScript.