← All Machine Learning Foundations modules

Module 3 — Classification: Drawing Boundaries

Supervised learning · hands-on · about 30 minutes.

Where regression predicts a continuous quantity, classification predicts a categorical outcome drawn from a finite label set — spam or legitimate, benign or malignant, one of several object classes. The learning task is to estimate a decision boundary that partitions the feature space into regions associated with each class; a new observation is then assigned the label of the region into which it falls. In this module you construct the most transparent such classifier by hand.

The simplest classifier: ask the neighbours

k-nearest-neighbours (k-NN) is a non-parametric, instance-based method: it estimates no parameters and performs no optimisation during training, instead storing the labelled examples themselves. To classify a query point, it identifies the \( k \) training instances nearest to it and assigns the plurality (majority) class among them. The hyperparameter \( k \) controls the bias–variance trade-off: \( k = 1 \) reproduces the label of the single closest example (low bias, high variance), whereas a larger \( k \) averages over a wider neighbourhood, suppressing noise at the cost of a smoother, more biased boundary. Proximity is measured by a distance metric — conventionally the Euclidean norm \( \lVert \mathbf{x} - \mathbf{x}_i \rVert_2 \) introduced in Course 2, Module 4.

Click anywhere in the plot to position a query point. The widget marks its \( k \) nearest neighbours, reports their vote, and shades each region by its predicted class. Adjust \( k \) and observe how the prediction at a contested location changes as additional neighbours are enfranchised. (A query point is placed for you at the centre to begin.)

This activity needs JavaScript. The lesson below still covers everything.

The decision boundary

Evaluating the classifier at every location in the feature space partitions it into class regions; the locus along which the predicted label changes is the decision boundary, rendered as the shaded background above. For small \( k \) the boundary is highly irregular, contorting around individual training points — a symptom of overfitting (high variance). As \( k \) increases, the boundary becomes smoother and more stable, exchanging variance for bias.

From votes to probabilities

Most classifiers report not only a predicted label but an estimated posterior probability of class membership. Logistic regression (Course 2, Module 8) forms a linear score \( z = \mathbf{w}\cdot\mathbf{x} + b \) and maps it into \( (0,1) \) through the logistic (sigmoid) function:

\[ \hat{p} \;=\; \sigma(z) \;=\; \frac{1}{1 + e^{-z}} \]

A decision threshold (conventionally \( 0.5 \)) converts this probability into a label: the positive class is predicted when \( \hat{p} \) exceeds the threshold. Shifting the threshold trades false positives against false negatives — the operating-point choice formalised in Module 8.

The same thing in scikit-learn — run it right here, nothing to install
from sklearn.neighbors import KNeighborsClassifier

clf = KNeighborsClassifier(n_neighbors=5)
clf.fit(X_train, y_train)        # just memorizes the labeled points
clf.predict([[2.0, 3.1]])      # majority vote of the 5 nearest → 0
clf.predict_proba([[2.0, 3.1]]) # [0.6 0.4] → 3 of the 5 are class 0

This cell runs live in your browser — the snippet above is a preview; press Run it yourself to execute the full program (no installation). On the two-cluster dataset it prints train accuracy = 0.838, predict([2.0, 3.1]) → 0, and predict_proba → [0.6, 0.4], then plots the \( k = 5 \) decision boundary. Note that the training accuracy is well below 100% because the classes overlap and \( k = 5 \) deliberately smooths over individual points — reduce n_neighbors toward 1 and rerun to see the boundary sharpen and the training accuracy rise (toward overfitting). Swap in LogisticRegression() for a smooth linear boundary with calibrated probabilities.

AI anchor — classification runs the alarms Spam filters, fraud detection, medical triage, content moderation, and the final layer of an image classifier are all classification. The model draws a boundary in feature space and reports which side a new example falls on, usually with a probability. Picking the threshold is a real product decision: a cancer screen leans toward catching every true case (few misses) even at the cost of more false alarms — a choice you can only reason about with the evaluation tools in Module 8.

Classify the claims

A few questions on neighbors, boundaries, and thresholds. You will get a score.

This activity needs JavaScript.

Why this matters next k-NN measures distance; the next classifier, naive Bayes (Module 4), instead multiplies probabilities — turning Bayes’ rule from Course 2 into a working spam filter. Both draw boundaries; they just disagree about what "close" means. And every classifier here is only as trustworthy as the honest evaluation in Module 8.
One-sentence summary: classification predicts a category by drawing a decision boundary; k-nearest-neighbors does it by majority vote of the \( k \) closest labeled points, while logistic regression outputs a probability \( \sigma(w\cdot x + b) \) that you threshold into a class.

Next: Naive Bayes →