Module 8 · Capstone — The Math of a Tiny Model
Every tool in this course was building toward one thing: an actual model that learns. Now we run one — logistic regression, the simplest real classifier — end to end on a tiny dataset, and you'll watch each pillar show up exactly where it belongs. Vectors, the dot product, probability, the loss function, the gradient step, and evaluation: all six, in one machine, in one page.
The task: will a student pass?
We predict whether a student passes an exam from two features: hours studied and practice tests taken. Each student is a point with a known outcome (pass = 1, fail = 0). The model's job: learn a rule that separates the passes from the fails — and report a probability, not just a guess.
Step 1 — Each example is a vector (Pillar 2: linear algebra)
A student becomes a feature vector \( \mathbf{x} = [\text{hours}, \text{tests}] \). The model holds a weight vector \( \mathbf{w} = [w_1, w_2] \) and a bias \( b \) — the numbers it will learn.
Step 2 — Score the student with a dot product (Pillar 2)
Combine features and weights into one number — the exact dot product from Module 4, plus the bias:
A big positive \( z \) leans "pass," a big negative \( z \) leans "fail." But \( z \) can be any number — we need a probability.
Step 3 — Squash the score into a probability (Pillar 1: probability)
The sigmoid function takes any real number and squashes it into \( (0, 1) \) — a probability, the Bernoulli parameter from Module 3:
Now \( \hat{y} \) reads as "the model's probability this student passes." That's a real prediction.
Step 4 — Measure how wrong it is (Pillar 1 + notation from Module 1)
The loss compares the prediction \( \hat{y} \) to the truth \( y \). Logistic regression uses log loss (cross-entropy) — and notice the log, exactly the product-into-sum trick from Module 1:
It's near 0 when the model is confidently right and large when confidently wrong. The average of this over all students is the number we want to shrink.
Step 5 — Take one step downhill (Pillar 3: optimization)
Compute the gradient of the loss with respect to each weight, then nudge every weight downhill — the rule from Module 6:
Repeat over the data and the loss falls — the model learns the weights. Run it below and watch every quantity move at once.
This activity needs JavaScript. The walkthrough above still covers every step.
Step 6 — Judge the trained model (Pillar 4: statistics)
Once trained, we score the model with the statistics from Module 7 — accuracy (a mean), and the decision boundary it learned. A good model puts the passes on one side and the fails on the other. The demo reports accuracy as it trains.
Put the whole pipeline together
The synthesis check: match each step of the model to the pillar it came from, and read the math one last time. Pass it to complete the course.
This activity needs JavaScript.