← All Neural Networks & Deep Learning modules

Module 5 — Backpropagation Intuition

Putting neurons together · hands-on · about 30 minutes.

Module 4 left one step as a black box: "compute, for every weight, which way to nudge it." That step is backpropagation, and despite the intimidating name it is just one idea you already know from Course 2 — the chain rule — run backward through the network. This module opens the box so you can watch the blame flow back, weight by weight.

The idea: share out the blame

After a forward pass you have a prediction and a loss. Backprop answers: how much did each weight contribute to that error? It starts at the output, where the error is obvious (prediction minus target), and works backward. A weight near the output gets blamed directly; a weight deep inside gets blamed through everything it fed into — and the chain rule is exactly the bookkeeping that multiplies those influences together.

Press Train one step below. The network does a forward pass, measures the loss, then backprop lights up each connection by how much it’s to blame — thicker means a bigger gradient. Watch the weights with the most blame move the most, and the loss fall step after step.

This activity needs JavaScript. The lesson below still covers everything.

Backprop in PyTorch — read only, nothing to install
pred = model(X)                 # forward pass
loss = loss_fn(pred, y)         # one number
loss.backward()                 # BACKPROP — fills every weight's .grad via the chain rule
optimizer.step()                # update: w ← w − η · w.grad
optimizer.zero_grad()           # clear grads for the next step

That single loss.backward() call is the backward flow on the canvas — it computes \( \partial\text{loss}/\partial w \) for every weight automatically. You never write the chain rule by hand; the framework does it for you.

AI anchor — the algorithm that made deep learning possible Backpropagation, popularized in 1986, is the reason we can train deep networks at all: it computes all the gradients in one efficient backward sweep instead of poking each weight separately. Every modern framework — PyTorch, TensorFlow, JAX — is built around an "autograd" engine that does exactly this, at the scale of billions of weights. The thing lighting up the connections below is, in miniature, what trains every large model on Earth.

Check your understanding

A few questions about backpropagation. You will get a score.

This activity needs JavaScript.

Why this matters next You now have the whole engine: forward, loss, backprop, update. Module 6 asks what you actually buy by stacking more layers — you will add depth to a network on a tangled spiral and watch it carve out a boundary no shallow model could.
One-sentence summary: backpropagation is the chain rule run backward through the network — it starts from the output error and shares the blame to every weight as a gradient, so gradient descent knows which way to nudge each one; the weights most responsible for the error get the biggest gradient and move the most.

Next: What Depth Buys You →