Module 6 — Derivatives & Gradient Descent
You now know what a loss function is — a number measuring how wrong a model is. This module answers the question that is machine learning: how does a model make that number smaller? The answer is gradient descent — follow the slope downhill. First we need the slope, which is what a derivative measures.
The derivative: slope at a point
The derivative of a function is its slope — how fast the output changes as you nudge the input. On a curve, it's the steepness of the line that just touches the curve at a point. Steep and rising → large positive derivative; steep and falling → large negative; flat → zero.
You don't need to compute derivatives by hand here — you need the one fact that drives training: the derivative points uphill, so its negative points downhill. That sign is the compass.
Minima: where the slope is zero
A function's minimum is its lowest point — and there, the slope is flat (zero). A loss function's minimum is the best the model can do. Gradient descent is a procedure for walking to that bottom without ever being told where it is, using only the local slope under your feet.
Gradient descent: roll the ball downhill
Stand somewhere on the curve. Measure the slope. Take a small step in the downhill direction. Repeat. The step rule is the heart of all model training:
Here \( \eta \) (eta) is the learning rate — how big a step to take. Too small and you crawl; too big and you overshoot and bounce — maybe forever. The demo below lets you drop a ball on a loss curve and tune \( \eta \). Find the rate that reaches the bottom fastest without blowing up.
This activity needs JavaScript. The lesson below still covers everything.
The gradient: slope in many directions at once
Real models have millions of parameters, not one. The gradient is just the collection of slopes — one per parameter — bundled into a vector that points in the direction of steepest increase. Gradient descent steps in the opposite direction. The 1-D ball you're rolling is the same idea; a real model rolls downhill in a million-dimensional bowl.
Tune the training
Predict what happens to gradient descent under different conditions. You'll get a score.
This activity needs JavaScript.