Module 8 · Capstone — The Mathematics of a Complete Model

Synthesis · about 35 minutes.

Every concept in this course contributes to a single objective: a model that learns from data. This capstone applies logistic regression — the simplest production classifier — end to end on a small dataset, with each mathematical pillar appearing in its respective role. Vectors, the dot product, probability, the loss function, the gradient step, and evaluation are integrated into a single model within this module.

The task: predicting whether a student passes

The objective is to predict whether a student passes an exam from two features: hours studied and practice tests taken. Each student is represented as a point with a known outcome (pass = 1, fail = 0). The model must learn a decision rule that separates passing from failing students and report a probability rather than a categorical prediction alone.

Step 1 — Each example is a vector (Pillar 2: linear algebra)

A student becomes a feature vector \( \mathbf{x} = [\text{hours}, \text{tests}] \). The model holds a weight vector \( \mathbf{w} = [w_1, w_2] \) and a bias \( b \) — the numbers it will learn.

Step 2 — Score the student with a dot product (Pillar 2)

Combine features and weights into one number — the exact dot product from Module 4, plus the bias:

\[ z \;=\; \mathbf{w}\cdot\mathbf{x} + b \;=\; w_1\,\text{hours} + w_2\,\text{tests} + b \]

A large positive \( z \) favors "pass," and a large negative \( z \) favors "fail." However, \( z \) can take any real value, whereas a probability is required.

Step 3 — Map the score to a probability (Pillar 1: probability)

The sigmoid function maps any real number to the interval \( (0, 1) \), yielding a probability — the Bernoulli parameter from Module 3:

\[ \hat{y} \;=\; \sigma(z) \;=\; \frac{1}{1 + e^{-z}} \]

The quantity \( \hat{y} \) is now interpretable as the model's estimated probability that the student passes.

Step 4 — Quantify the prediction error (Pillar 1 + notation from Module 1)

The loss compares the prediction \( \hat{y} \) to the true label \( y \). Logistic regression uses the log loss (cross-entropy); note the logarithm, which performs the product-to-sum conversion introduced in Module 1:

\[ L \;=\; -\big[\, y\,\log\hat{y} \;+\; (1-y)\log(1-\hat{y}) \,\big] \]

The loss is near 0 when the model is confidently correct and large when it is confidently incorrect. The mean of this loss over all students is the quantity to be minimized.

Step 5 — Apply one gradient-descent update (Pillar 3: optimization)

Compute the gradient of the loss with respect to each weight, then update every weight in the direction of decreasing loss — the rule from Module 6:

\[ \mathbf{w} \;\leftarrow\; \mathbf{w} - \eta\,\nabla_{\mathbf{w}} L, \qquad b \;\leftarrow\; b - \eta\,\frac{\partial L}{\partial b} \]

Iterating this update over the data decreases the loss; the model thereby learns the weights. Run the procedure below and observe every quantity update simultaneously.

This activity needs JavaScript. The walkthrough above still covers every step.

Step 6 — Evaluate the trained model (Pillar 4: statistics)

Once trained, the model is evaluated using the statistics from Module 7 — accuracy (a mean) and the learned decision boundary. A well-fitted model separates passing students from failing students. The activity reports accuracy throughout training.

AI anchor — you have traced the execution of a real model This is not a simplified analogy — logistic regression is a production classifier, used in credit scoring, medical risk assessment, spam filtering, and click-through prediction, and it is precisely a single neuron of a neural network: dot product, bias addition, a non-linear activation, loss computation, and a gradient step. Composing many such units across multiple layers yields deep learning. Every concept introduced in Course 4 is built from the six steps developed in this module.

Capstone — synthesis across the course

This synthesis assessment requires matching each step of the model to the mathematical pillar from which it derives. Completing it concludes the course.

This activity needs JavaScript.

Subsequent courses You can now read ML notation, reason with probability and Bayes' rule, manipulate vectors and matrices, explain gradient descent, summarize data using statistics, and trace all of these through a complete model. This constitutes the mathematical foundation on which the remainder of the track is built. Machine Learning Foundations (Course 3) applies these tools in code; the neural networks of Course 4 are compositions of many instances of the model developed here.

Summary: a logistic-regression model represents each example as a vector, computes a score via a dot product (\( z = \mathbf{w}\cdot\mathbf{x}+b \)), maps the score to a probability with the sigmoid function, quantifies error with the log loss, and learns by updating the weights via gradient descent — integrating every mathematical pillar of this course into a single model.

Next: Capstone — Build a Concept Manipulative →