← All Machine Learning Foundations modules

Module 6 — Clustering: k-means

Unsupervised learning · hands-on · about 30 minutes.

Every model so far needed labels — someone had to tag each example "spam" or "not," "class A" or "B." Now we cut the labels away. Unsupervised learning finds structure in data nobody has tagged. The most famous example is k-means clustering: hand it a cloud of points and a number \( k \), and it discovers \( k \) groups on its own.

The idea: pick centers, then settle

k-means looks for \( k \) centroids — the centers of \( k \) groups — by repeating two dead-simple steps until nothing moves:

  1. Assign: color each point by its nearest centroid.
  2. Update: move each centroid to the average of the points that chose it.

That's the whole algorithm. Assign, update, assign, update — each round can only lower the total spread, so it always settles. The quantity it drives down is the inertia: the sum of squared distances from every point to its centroid.

\[ \text{inertia} \;=\; \sum_{i} \lVert x_i - \mu_{c(i)} \rVert^2 \]

where \( \mu_{c(i)} \) is the centroid of the cluster point \( x_i \) was assigned to. Lower inertia means tighter, cleaner groups.

Run it yourself

Set \( k \), then press Step to watch one assign-and-update round at a time, or Run to let it converge. The centroids (the big rings) start in random spots and walk toward the heart of each cluster. Reset re-seeds them — notice the final groups can change depending on where the centroids started.

This activity needs JavaScript. The lesson below still covers everything.

Choosing k: the elbow

k-means can't tell you how many groups exist — you pick \( k \). Too few and you merge distinct groups; too many and you split one group into meaningless shards. A common trick is the elbow: plot inertia as \( k \) grows. It always drops, but the drop slows sharply once you pass the "true" number of clusters — that bend is a good \( k \).

The same thing in scikit-learn — run it right here, nothing to install
from sklearn.cluster import KMeans

km = KMeans(n_clusters=3)   # the k you set on the slider
km.fit(X)                     # no labels y — that's what "unsupervised" means
km.labels_                       # which cluster each point landed in
km.inertia_                      # the spread it minimized

Notice fit(X) takes no y. There are no right answers to learn from — the structure comes entirely from the data’s shape. Hit Run it yourself, then change n_clusters and watch the inertia and the centroids move.

AI anchor — finding groups nobody labeled Clustering is how you make sense of data before anyone has tagged it: customer segments for marketing, grouping similar documents or images, compressing colors in a photo, spotting anomalies (a point far from every centroid is suspicious), even the first pass at organizing a genome. It’s the workhorse of exploratory analysis — the step where you ask "what natural groups are hiding in here?" before you commit to a labeled, supervised model.

Group the claims

A few questions on assignment, inertia, and choosing k. You will get a score.

This activity needs JavaScript.

Why this matters next Clustering groups points that sit close together; the next tool, dimensionality reduction (Module 7), instead finds the directions the data actually varies in — squashing many features down to a few without losing the shape. Together they’re the two halves of unsupervised learning: find the groups, then find the axes.
One-sentence summary: k-means is unsupervised — it finds \( k \) groups in unlabeled data by repeating "assign each point to its nearest centroid, then move each centroid to its points’ average," driving down the inertia \( \sum \lVert x_i - \mu_{c(i)} \rVert^2 \) until it settles.

Next: Dimensionality Reduction — PCA →