← All Math Foundations modules

Module 8 · Capstone — The Math of a Tiny Model

Synthesis · hands-on · about 35 minutes.

Every tool in this course was building toward one thing: an actual model that learns. Now we run one — logistic regression, the simplest real classifier — end to end on a tiny dataset, and you'll watch each pillar show up exactly where it belongs. Vectors, the dot product, probability, the loss function, the gradient step, and evaluation: all six, in one machine, in one page.

The task: will a student pass?

We predict whether a student passes an exam from two features: hours studied and practice tests taken. Each student is a point with a known outcome (pass = 1, fail = 0). The model's job: learn a rule that separates the passes from the fails — and report a probability, not just a guess.

Step 1 — Each example is a vector (Pillar 2: linear algebra)

A student becomes a feature vector \( \mathbf{x} = [\text{hours}, \text{tests}] \). The model holds a weight vector \( \mathbf{w} = [w_1, w_2] \) and a bias \( b \) — the numbers it will learn.

Step 2 — Score the student with a dot product (Pillar 2)

Combine features and weights into one number — the exact dot product from Module 4, plus the bias:

\[ z \;=\; \mathbf{w}\cdot\mathbf{x} + b \;=\; w_1\,\text{hours} + w_2\,\text{tests} + b \]

A big positive \( z \) leans "pass," a big negative \( z \) leans "fail." But \( z \) can be any number — we need a probability.

Step 3 — Squash the score into a probability (Pillar 1: probability)

The sigmoid function takes any real number and squashes it into \( (0, 1) \) — a probability, the Bernoulli parameter from Module 3:

\[ \hat{y} \;=\; \sigma(z) \;=\; \frac{1}{1 + e^{-z}} \]

Now \( \hat{y} \) reads as "the model's probability this student passes." That's a real prediction.

Step 4 — Measure how wrong it is (Pillar 1 + notation from Module 1)

The loss compares the prediction \( \hat{y} \) to the truth \( y \). Logistic regression uses log loss (cross-entropy) — and notice the log, exactly the product-into-sum trick from Module 1:

\[ L \;=\; -\big[\, y\,\log\hat{y} \;+\; (1-y)\log(1-\hat{y}) \,\big] \]

It's near 0 when the model is confidently right and large when confidently wrong. The average of this over all students is the number we want to shrink.

Step 5 — Take one step downhill (Pillar 3: optimization)

Compute the gradient of the loss with respect to each weight, then nudge every weight downhill — the rule from Module 6:

\[ \mathbf{w} \;\leftarrow\; \mathbf{w} - \eta\,\nabla_{\mathbf{w}} L, \qquad b \;\leftarrow\; b - \eta\,\frac{\partial L}{\partial b} \]

Repeat over the data and the loss falls — the model learns the weights. Run it below and watch every quantity move at once.

This activity needs JavaScript. The walkthrough above still covers every step.

Step 6 — Judge the trained model (Pillar 4: statistics)

Once trained, we score the model with the statistics from Module 7 — accuracy (a mean), and the decision boundary it learned. A good model puts the passes on one side and the fails on the other. The demo reports accuracy as it trains.

AI anchor — you just traced a real model This is not a toy analogy — logistic regression is a production classifier, used for credit scoring, medical risk, spam, and click prediction, and it's exactly one neuron of a neural network: dot product, add bias, apply a squashing function, score the loss, step the gradient. Stack thousands of these and add more layers and you have deep learning. Every idea you'll meet in Course 4 is built from the six steps on this page.

Put the whole pipeline together

The synthesis check: match each step of the model to the pillar it came from, and read the math one last time. Pass it to complete the course.

This activity needs JavaScript.

Where this goes next You can now read ML notation, reason with probability and Bayes, work with vectors and matrices, explain gradient descent, summarize data with statistics — and trace all of it through a complete model. That is the entire toolkit the rest of the track is built on. Machine Learning Foundations (Course 3) puts these tools to work in code; the neural networks of Course 4 are just many copies of the tiny model you just ran.
One-sentence summary: a logistic-regression model turns each example into a vector, scores it with a dot product (\( z = \mathbf{w}\cdot\mathbf{x}+b \)), squashes the score into a probability with the sigmoid, measures error with a log loss, and learns by stepping the weights downhill via gradient descent — every pillar of this course, working together in one machine.

← Back to all modules