← All Neural Networks & Deep Learning modules

Module 4 — Gradient Descent in Practice

Putting neurons together · hands-on · about 30 minutes.

Module 3 ran a network whose weights were already good. Now the real question: how does a network find good weights from a pile of random ones? The answer is the engine under all of deep learning — gradient descent — and you met its core idea in Course 2. Here you will run it live, watch a loss curve fall, and discover that one dial, the learning rate, makes the difference between a network that learns and one that blows up.

The training loop

Training repeats four steps, over and over, for many passes through the data (each full pass is an epoch):

\[ w \;\leftarrow\; w \;-\; \eta \,\frac{\partial \,\text{loss}}{\partial w} \]

Here \( \eta \) (eta) is the learning rate. Too small and training crawls; too big and it overshoots the valley and the loss explodes. Hit Train and watch a real network learn — then drag the learning rate and break it on purpose.

This activity needs JavaScript. The lesson below still covers everything.

The same training loop in Keras — read only, nothing to install
from tensorflow.keras.optimizers import SGD

model.compile(optimizer=SGD(learning_rate=0.1),   # η — the dial you’re turning
              loss='binary_crossentropy')        # the loss it minimizes
model.fit(X, y, epochs=200)                     # run the loop 200 passes

.fit() is the loop above: forward, loss, gradient, update — repeated for every epoch. Everything you’re watching on the canvas is what happens inside that one call.

AI anchor — one algorithm trains them all Gradient descent (and close cousins like Adam) trains essentially every neural network in production — image models, recommenders, and language models with hundreds of billions of weights. The scale is staggering but the loop is exactly this one: forward, loss, gradient, step. Choosing the learning rate (and its schedule) is one of the most consequential knobs a practitioner sets; you just felt why.

Check your understanding

A few questions about training. You will get a score.

This activity needs JavaScript.

Why this matters next One step in the loop was a black box: "compute which way to nudge each weight." Module 5 opens it. Backpropagation is just the chain rule, run backward through the network, and you will step through it one gradient at a time.
One-sentence summary: a network learns by repeating a training loop — forward pass, loss, gradient, weight update — for many epochs, stepping every weight downhill on the loss; the learning rate sets the step size, and getting it right is the difference between smooth learning and a diverging blow-up.

Next: Backpropagation Intuition →