Module 3 — Bayes & Random Variables
Module 2 asked "given one clue, how likely is the event?" This module gives you the engine for updating a belief as evidence arrives — Bayes' rule — and the language of random variables that lets a model talk about uncertain quantities. Together they are how machines reason under uncertainty.
Bayes' rule: turning the question around
Often you know one conditional but want the other. A medical test tells you \( P(\text{positive} \mid \text{sick}) \) — how often it fires for sick people. But the patient wants \( P(\text{sick} \mid \text{positive}) \) — given a positive result, am I actually sick? Bayes' rule flips one into the other:
Read it as three named pieces:
- Prior \( P(H) \) — what you believed before the evidence (the base rate).
- Likelihood \( P(E \mid H) \) — how well the hypothesis explains the evidence.
- Posterior \( P(H \mid E) \) — your updated belief after seeing the evidence. It becomes the new prior for the next clue.
The base-rate trap
The single most common probability mistake: ignoring the prior. A test that is "99% accurate" for a disease that affects 1 in 1,000 people will, on a positive result, usually be a false alarm — because there are so many more healthy people that even a tiny false-positive rate produces more false alarms than true cases.
Put numbers to it. The prior is \( P(\text{sick}) = 0.001 \); the test is 99% accurate, so \( P(+ \mid \text{sick}) = 0.99 \) and the false-positive rate is \( 0.01 \). Writing Bayes' rule out in full:
Per 1,000 people that is about 1 true positive against roughly 10 false alarms — the rare base rate dominates. The slider demo makes this undeniable: drag the disease down to "rare" and watch a positive test stay mostly wrong.
This activity needs JavaScript. The lesson below still covers everything.
Evidence stacks: updating again and again
Real models rarely see one clue. They see many, and Bayes lets you fold them in one at a time — yesterday's posterior is today's prior. Each independent piece of evidence multiplies the odds. Below, add evidence one click at a time and watch a single belief climb (or fall) as the case builds.
This activity needs JavaScript.
Random variables: numbers that haven't happened yet
A random variable is a number whose value is uncertain — written with a capital letter such as \( X \). A fair die is the classic example: its outcome is \( X \in \{1, 2, 3, 4, 5, 6\} \), with \( P(X = k) = \tfrac{1}{6} \) for every face \( k \). Other random variables: whether an email is spam, \( X \in \{0, 1\} \); tomorrow's temperature, \( X \in \mathbb{R} \). We summarize one with its expectation \( E[X] \): the long-run average, computed by weighting each value by its probability.
For the die, \( E[X] = \sum_{k=1}^{6} k \cdot \tfrac{1}{6} = \dfrac{1+2+3+4+5+6}{6} = 3.5 \). No single roll is ever \( 3.5 \) — it is the value the average of many rolls settles toward. Roll it below and watch the running average converge.
This activity needs JavaScript. The lesson below still covers everything.
Two distributions show up constantly in ML:
- Bernoulli — a single yes/no trial with probability \( p \) of "yes." Every binary label (spam / not-spam, click / no-click) is a Bernoulli variable. A classifier's output is an estimate of that \( p \).
- Normal (the bell curve) — the distribution of things that pile up around an average: heights, measurement noise, the errors a model makes. You'll meet it again in Module 7.
This activity needs JavaScript.
Check your Bayesian reflexes
These questions reward the instinct to ask "what was the base rate?" before trusting any piece of evidence.
This activity needs JavaScript.