← All Machine Learning Foundations modules

Module 5 — Decision Trees

Supervised learning · hands-on · about 30 minutes.

A decision tree is the most human-readable model in machine learning: it classifies by asking a sequence of yes/no questions about the features, like a flowchart. "Is income > $50k? If yes, is age < 30?" In this module you will grow a tree one level at a time and watch it carve the data into boxes — and then watch it overfit.

Splits: one question at a time

At each step the tree picks the single feature-and-threshold question that best separates the classes — for example "is \( x_1 < 3.2 \)?" That split divides the data into two groups. The tree then repeats inside each group, asking another question, building a flowchart of splits that ends in leaves where it commits to a class.

How it picks the "best" split: impurity

"Best" means the split that makes each side as pure as possible — ideally all one class. A common purity score is the Gini impurity of a group:

\[ \text{Gini} \;=\; 1 - \sum_{c} p_c^2 \]

where \( p_c \) is the fraction of the group in class \( c \). It is 0 for a perfectly pure group and 0.5 for a 50/50 mix. The tree greedily chooses the split that drops the (weighted) impurity the most. No calculus, no gradient — just "try the splits, keep the best."

Grow the tree

Slide max depth up. At depth 1 the tree asks one question — one straight cut. Each extra level lets it ask follow-up questions, bending the boundary into more boxes. Watch training accuracy climb… and keep an eye on whether the deep boxes are catching a real pattern or just lassoing single noisy points.

This activity needs JavaScript. The lesson below still covers everything.

Depth is a double-edged sword

A shallow tree may underfit — too few questions to capture the pattern. A very deep tree can drive training accuracy to 100% by drawing a tiny box around every point, but those boxes won’t survive on new data: it has overfit. The right depth is a balance (the bias–variance tradeoff of Module 8), usually found by checking a held-out set.

The same thing in scikit-learn — run it right here, nothing to install
from sklearn.tree import DecisionTreeClassifier

clf = DecisionTreeClassifier(max_depth=3)  # the slider you just moved
clf.fit(X_train, y_train)               # greedily picks splits by impurity
clf.score(X_test, y_test)              # accuracy on unseen data

max_depth is the single most important knob — it directly controls overfitting, exactly as the slider shows. Hit Run it yourself, then bump max_depth up and watch the train/test gap widen.

AI anchor — the model that rules tabular data A single tree is rarely used alone, but combine hundreds of them and you get random forests and gradient-boosted trees (XGBoost, LightGBM) — the models that still win most competitions and power most real-world systems on spreadsheet-style data: credit scoring, churn, ad ranking, risk. They are popular because they need little tuning, handle mixed feature types, and — through feature importances — can say which questions mattered, a rare and valuable kind of explainability.

Branch out

A few questions on splits, impurity, and depth. You will get a score.

This activity needs JavaScript.

Why this matters next Trees, k-NN, naive Bayes, and regression all needed labels. Module 6 cuts the labels away: k-means clustering finds groups in data nobody has tagged — the unsupervised half of machine learning, and the start of how models find structure on their own.
One-sentence summary: a decision tree classifies with a flowchart of yes/no feature splits, greedily chosen to lower impurity (Gini \( = 1 - \sum p_c^2 \)); deeper trees fit the training data better but overfit, so depth must be tuned on unseen data.

Next: Clustering — k-means →