Module 2 — Regression: Fitting a Line
The simplest real model predicts a number: how much will this house sell for? how long will this delivery take? That is regression, and the workhorse is the straight line. In this module you will drag a line over real data, watch the error rise and fall, and then let the computer find the best line — the exact thing LinearRegression does.
A line is a model with two knobs
A straight-line model has two parameters: a slope \( w \) and an intercept \( b \). Given an input \( x \), its prediction is:
Change \( w \) and the line tilts; change \( b \) and it slides up or down. "Fitting" the model means choosing \( w \) and \( b \) so the line passes as close as possible to the points.
Residuals: how wrong is each prediction?
For a point \( (x_i, y_i) \), the residual is the vertical gap between the real value and the line’s guess: \( y_i - \hat{y}_i \). Some points sit above the line (positive), some below (negative). We want them all small.
The cost: mean squared error
To turn all those gaps into one number to minimize, we square each residual (so positives and negatives don’t cancel, and big misses hurt more) and average them — the mean squared error:
Drag the sliders below. The faint red sticks are the residuals; the MSE number is their average squared length. Your job: make it as small as you can — then hit Auto-fit and see how close you got to the best possible line.
This activity needs JavaScript. The lesson below still covers everything.
How the computer finds the best line
You minimized MSE by hand. A computer does it two ways: with a one-shot formula (the normal equation, ordinary least squares), or by gradient descent — start anywhere, repeatedly step \( w \) and \( b \) downhill on the MSE surface until it bottoms out (Course 2, Module 6). For a straight line both give the same answer; gradient descent is what scales to models with millions of parameters.
from sklearn.linear_model import LinearRegression model = LinearRegression() model.fit(X_train, y_train) # learns w (slope) and b (intercept) pred = model.predict(X_test) # ŷ = wx + b on unseen data print(model.coef_, model.intercept_)import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression # Noisy linear data: the "true" line is y = 2.5x + 7 rng = np.random.default_rng(0) X = np.linspace(0, 10, 40).reshape(-1, 1) y = 2.5 * X.ravel() + 7 + rng.normal(0, 3, size=40) model = LinearRegression() model.fit(X, y) # learns w (slope) and b (intercept) print("slope w =", round(float(model.coef_[0]), 3)) print("intercept b =", round(float(model.intercept_), 3)) print("R^2 score =", round(model.score(X, y), 3)) plt.figure(figsize=(5, 3.2)) plt.scatter(X, y, s=18, label="data") plt.plot(X, model.predict(X), color="crimson", linewidth=2, label="best-fit line") plt.legend(); plt.title("LinearRegression fit"); plt.tight_layout() plt.show()
The .fit() call is the "Train" box from Module 1 — you just did its job by hand with the sliders. Hit Run it yourself to fit a real line in your browser, then change the numbers and rerun.
Read the regression
Answer a few questions about lines, residuals, and error. You will get a score.
This activity needs JavaScript.