← All Machine Learning Foundations modules

Module 2 — Regression: Fitting a Line

Supervised learning · hands-on · about 30 minutes.

The simplest real model predicts a number: how much will this house sell for? how long will this delivery take? That is regression, and the workhorse is the straight line. In this module you will drag a line over real data, watch the error rise and fall, and then let the computer find the best line — the exact thing LinearRegression does.

A line is a model with two knobs

A straight-line model has two parameters: a slope \( w \) and an intercept \( b \). Given an input \( x \), its prediction is:

\[ \hat{y} \;=\; w\,x + b \]

Change \( w \) and the line tilts; change \( b \) and it slides up or down. "Fitting" the model means choosing \( w \) and \( b \) so the line passes as close as possible to the points.

Residuals: how wrong is each prediction?

For a point \( (x_i, y_i) \), the residual is the vertical gap between the real value and the line’s guess: \( y_i - \hat{y}_i \). Some points sit above the line (positive), some below (negative). We want them all small.

The cost: mean squared error

To turn all those gaps into one number to minimize, we square each residual (so positives and negatives don’t cancel, and big misses hurt more) and average them — the mean squared error:

\[ \text{MSE} \;=\; \frac{1}{n}\sum_{i=1}^{n}\big(y_i - (w x_i + b)\big)^2 \]

Drag the sliders below. The faint red sticks are the residuals; the MSE number is their average squared length. Your job: make it as small as you can — then hit Auto-fit and see how close you got to the best possible line.

This activity needs JavaScript. The lesson below still covers everything.

How the computer finds the best line

You minimized MSE by hand. A computer does it two ways: with a one-shot formula (the normal equation, ordinary least squares), or by gradient descent — start anywhere, repeatedly step \( w \) and \( b \) downhill on the MSE surface until it bottoms out (Course 2, Module 6). For a straight line both give the same answer; gradient descent is what scales to models with millions of parameters.

The same thing in scikit-learn — run it right here, nothing to install
from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)      # learns w (slope) and b (intercept)
pred  = model.predict(X_test)       # ŷ = wx + b on unseen data
print(model.coef_, model.intercept_)

The .fit() call is the "Train" box from Module 1 — you just did its job by hand with the sliders. Hit Run it yourself to fit a real line in your browser, then change the numbers and rerun.

AI anchor — regression is everywhere prices and amounts live Zillow-style home-value estimates, demand forecasts, ad click-through prediction, estimated delivery times, and the "expected revenue" models behind ad auctions are all regression. The output is a number on a continuous scale, and the model is trained by shrinking squared (or similar) error. Even a neural network’s final layer is often just a regression: a weighted sum, \( w\cdot x + b \), exactly the line you tuned here — only with many more inputs.

Read the regression

Answer a few questions about lines, residuals, and error. You will get a score.

This activity needs JavaScript.

Why this matters next Swap the straight line for an S-shaped curve and "predict a number" becomes "predict a probability" — that is logistic regression and the gateway to classification (Module 3). The squared-error idea you just used also returns in Module 8 as the difference between a model that fits well and one that overfits.
One-sentence summary: regression predicts a number with a line \( \hat{y} = wx + b \), and "fitting" means choosing \( w \) and \( b \) to minimize the mean squared error — the average squared gap between the points and the line.

Next: Classification — Drawing Boundaries →