← All Math Foundations modules

Module 7 — Statistics for Data & Evaluation

Pillar 4 · Statistics · hands-on · about 30 minutes.

Models are built from data and judged by numbers. Statistics is the toolkit for both: summarizing a pile of data into a few honest numbers, and reading the metrics that tell you whether a model is any good. This module covers the summaries — mean, spread, the bell curve — and the traps that fool people who skip them.

Center: mean and median

The mean (average) adds the values and divides by how many — the balance point of the data:

\[ \bar{x} \;=\; \frac{1}{n}\sum_{i=1}^{n} x_i \]

The median is the middle value when sorted. They usually agree — but when a few huge values pull the mean while the median holds steady, that gap is itself information (think incomes, or response times). Edit the dataset below and watch both move.

Spread: variance and standard deviation

Center isn't enough — you need to know how spread out the data is. Variance averages the squared distance from the mean; standard deviation is its square root, back in the original units:

\[ \sigma^2 \;=\; \frac{1}{n}\sum_{i=1}^{n} (x_i - \bar{x})^2, \qquad \sigma \;=\; \sqrt{\sigma^2} \]

Small σ means the data huddles near the mean; large σ means it's scattered. Standard deviation is everywhere in ML: it's how we normalize features, measure noise, and report the spread of a model's errors.

This activity needs JavaScript. The lesson below still covers everything.

The normal distribution: the bell curve

Many natural quantities pile up symmetrically around a mean — heights, measurement noise, the errors a good model makes. That shape is the normal distribution, described entirely by its mean (where the peak sits) and its standard deviation (how wide). A handy rule: about 68% of data falls within one σ of the mean, 95% within two. The histogram demo overlays this curve so you can compare your data to the ideal bell.

Correlation — and why it isn't causation

Correlation measures whether two variables move together, summarized by \( r \) from −1 (perfect opposite) through 0 (no linear relationship) to +1 (perfect together). But correlation is not causation: ice-cream sales and drownings rise together (both driven by summer heat), yet neither causes the other. Models exploit correlation to predict — and mislead anyone who confuses it with cause.

This activity needs JavaScript.

Why averages mislead

A single number hides a lot. The same mean can come from tightly-clustered data or wildly scattered data; one outlier can drag an average somewhere no actual data point lives. "The average user…" is often a person who doesn't exist. The demo below lets you drop an outlier into a dataset and watch the mean lurch while the median barely flinches — the reason robust reporting shows both.

AI anchor — reading evaluation metrics Every claim about a model is a statistic. Accuracy is a mean (the fraction correct). A model that's "95% accurate" on data where 95% of cases are one class has learned nothing — it just guesses the majority; that's why you also report precision and recall (the conditional probabilities from Module 2). Reporting a metric without its standard deviation across runs hides whether the result is reliable or luck. And confusing correlation with causation is how a model that merely predicts gets mistaken for one that explains. Statistics is what keeps model evaluation honest.

Don't get fooled

Spot the statistical trap in each scenario. You'll get a score.

This activity needs JavaScript.

Why this matters next Statistics is how you'll evaluate every model in Courses 3 and 4 — accuracy, precision/recall, and the spread of results across runs. It's also how you'll prepare data: normalizing features by mean and standard deviation is a standard preprocessing step. The normal distribution returns whenever you reason about noise and uncertainty, and correlation is the raw material of every predictive feature.
One-sentence summary: the mean \( \bar{x} = \frac{1}{n}\sum x_i \) gives the center and the standard deviation σ gives the spread; the normal curve describes data that piles up around a mean; correlation measures co-movement (never causation); and reading these honestly is exactly what model evaluation requires.

Next: The Math of a Tiny Model →