← All Machine Learning Foundations modules

Module 7 — Dimensionality Reduction: PCA

Unsupervised learning · hands-on · about 30 minutes.

Real data often has hundreds of features — far too many to see or reason about. Dimensionality reduction squashes those many features down to a handful while keeping as much of the data’s shape as possible. The classic method is Principal Component Analysis (PCA): it finds the directions the data actually varies in, and lets you keep only the strongest few.

The key idea: variance is information

Picture a stretched, tilted oval of points. Most of the spread runs along its long axis; very little runs across the short one. PCA finds those axes — the principal components — ordered by how much the data varies along each:

If almost all the spread lives along PC1, you can throw PC2 away and describe each point by a single number — its position along PC1 — losing almost nothing. That’s reduction: two features become one.

Projecting onto a line

"Reducing to one dimension" means projecting every point onto a single line — sliding it straight onto the line at a right angle. The amount of spread you keep is the retained variance. Project onto PC1 and you keep the most possible; project onto any other line and you keep less.

Spin the line, watch the variance

Rotate the projection line below. The bar shows what fraction of the data’s total variance survives the projection. Find the angle that maxes it out — you’ve just found PC1 by hand. Then press Snap to PC1 to see the exact answer PCA computes.

This activity needs JavaScript. The lesson below still covers everything.

The same thing in scikit-learn — run it right here, nothing to install
from sklearn.decomposition import PCA

pca = PCA(n_components=1)        # keep just the strongest direction
Z = pca.fit_transform(X)            # each point → its position on PC1
pca.explained_variance_ratio_      # fraction of variance kept, e.g. [0.92]

explained_variance_ratio_ is exactly the "retained variance" bar below — how much of the shape you kept after dropping a dimension. Hit Run it yourself and see how little is lost when the data really lives along one direction.

AI anchor — squeezing big data down to size PCA and its cousins are everywhere high-dimensional data needs taming: visualizing a 50-feature dataset in 2-D, compressing images, speeding up models by feeding them fewer inputs, and removing redundant correlated features before training. Modern embeddings (the vectors behind search and recommendation) are reduced and compared the same way. Whenever someone plots "the data in 2-D," dimensionality reduction is what made that picture possible.

Reduce the claims

A few questions on components, variance, and projection. You will get a score.

This activity needs JavaScript.

Why this matters next You’ve now met every kind of model — regression, classification, naive Bayes, trees, clustering, and PCA. The last module asks the question that decides whether any of them is trustworthy: does it actually work on data it has never seen? Module 8 is the capstone — overfitting, train/test splits, cross-validation, and the bias–variance tradeoff that has haunted every slider in this course.
One-sentence summary: PCA reduces dimensions by finding the orthogonal directions of greatest variance (the principal components) and projecting the data onto the strongest few — keeping the most "shape" (retained variance) in the fewest features.

Next: Honest Evaluation — the capstone →