Module 6 — Training a Tiny Language Model
You have built the whole architecture — embeddings, attention, the transformer block. But a freshly-built model is random: its weights are noise, so its predictions are noise. Where does the "knowledge" come from? The same place it did in Course 4: training. In this module you train a real language model live, in your browser, and watch its loss fall and its output crawl from gibberish toward English.
What "training a language model" means
It is exactly the Course 4 loop, with one specific job: predict the next token. For every position in the training text, the model predicts a probability for the next character, and we measure how surprised it was by the character that actually came — that surprise is the loss (cross-entropy). Gradient descent then nudges every weight to reduce that surprise. Repeat over the whole text many times (each pass is an epoch), and the predictions sharpen.
The model below is a genuine trainable next-character model — real weights, real cross-entropy, real gradient descent. Nothing here is faked: when you click Train, it actually learns.
This activity needs JavaScript. The lesson below still covers everything.
Reading the loss curve
The falling curve is the model getting less surprised by its training text — better at predicting the next character. Early on it drops fast (easy wins, like learning that a space often follows certain letters), then flattens as it squeezes out the harder patterns. This is the identical shape you saw training neural nets in Course 4, because it is the same process — only the task ("predict the next token") is specific to language.
for epoch in range(n_epochs): logits = model(inputs) # predict next-token scores loss = cross_entropy(logits, targets) # how surprised were we? loss.backward() # gradients (Course 4 backprop) optimizer.step() # nudge every weight downhill
This is the same four lines that trained the spiral classifier in Course 4. An LLM is trained by this exact loop — just with a transformer, far more text, and far more compute.
Check your understanding
A few questions about training. You will get a score.
This activity needs JavaScript.