← All Math Foundations modules

Module 3 — Bayes & Random Variables

Pillar 1 · Probability · hands-on · about 30 minutes.

Module 2 asked "given one clue, how likely is the event?" This module gives you the engine for updating a belief as evidence arrives — Bayes' rule — and the language of random variables that lets a model talk about uncertain quantities. Together they are how machines reason under uncertainty.

Bayes' rule: turning the question around

Often you know one conditional but want the other. A medical test tells you \( P(\text{positive} \mid \text{sick}) \) — how often it fires for sick people. But the patient wants \( P(\text{sick} \mid \text{positive}) \) — given a positive result, am I actually sick? Bayes' rule flips one into the other:

\[ P(H \mid E) \;=\; \frac{P(E \mid H)\,P(H)}{P(E)} \]

Read it as three named pieces:

The base-rate trap

The single most common probability mistake: ignoring the prior. A test that is "99% accurate" for a disease that affects 1 in 1,000 people will, on a positive result, usually be a false alarm — because there are so many more healthy people that even a tiny false-positive rate produces more false alarms than true cases.

Put numbers to it. The prior is \( P(\text{sick}) = 0.001 \); the test is 99% accurate, so \( P(+ \mid \text{sick}) = 0.99 \) and the false-positive rate is \( 0.01 \). Writing Bayes' rule out in full:

\[ P(\text{sick} \mid +) \;=\; \frac{P(+ \mid \text{sick})\,P(\text{sick})}{P(+ \mid \text{sick})\,P(\text{sick}) \;+\; P(+ \mid \text{healthy})\,P(\text{healthy})} \;=\; \frac{0.99 \times 0.001}{0.99 \times 0.001 \;+\; 0.01 \times 0.999} \;\approx\; \frac{1}{11} \;\approx\; 9\% \]

Per 1,000 people that is about 1 true positive against roughly 10 false alarms — the rare base rate dominates. The slider demo makes this undeniable: drag the disease down to "rare" and watch a positive test stay mostly wrong.

This activity needs JavaScript. The lesson below still covers everything.

Evidence stacks: updating again and again

Real models rarely see one clue. They see many, and Bayes lets you fold them in one at a time — yesterday's posterior is today's prior. Each independent piece of evidence multiplies the odds. Below, add evidence one click at a time and watch a single belief climb (or fall) as the case builds.

This activity needs JavaScript.

Random variables: numbers that haven't happened yet

A random variable is a number whose value is uncertain — written with a capital letter such as \( X \). A fair die is the classic example: its outcome is \( X \in \{1, 2, 3, 4, 5, 6\} \), with \( P(X = k) = \tfrac{1}{6} \) for every face \( k \). Other random variables: whether an email is spam, \( X \in \{0, 1\} \); tomorrow's temperature, \( X \in \mathbb{R} \). We summarize one with its expectation \( E[X] \): the long-run average, computed by weighting each value by its probability.

\[ E[X] \;=\; \sum_i x_i \, P(X = x_i) \]

For the die, \( E[X] = \sum_{k=1}^{6} k \cdot \tfrac{1}{6} = \dfrac{1+2+3+4+5+6}{6} = 3.5 \). No single roll is ever \( 3.5 \) — it is the value the average of many rolls settles toward. Roll it below and watch the running average converge.

This activity needs JavaScript. The lesson below still covers everything.

Two distributions show up constantly in ML:

This activity needs JavaScript.

AI anchor — naive Bayes & adaptive testing A naive Bayes spam filter starts with the prior P(spam), then multiplies in the likelihood of each word it sees — "free," "invoice," "meeting" — updating the posterior word by word, exactly like the evidence-stack demo above. ("Naive" just means it pretends the words are independent, which works shockingly well.) The same machinery powers adaptive testing: QuantegyAI's own engine keeps a belief about your ability and updates it after every question — a right answer is evidence you're stronger, a wrong one evidence you're weaker, and the next question is chosen to sharpen that belief fastest.

Check your Bayesian reflexes

These questions reward the instinct to ask "what was the base rate?" before trusting any piece of evidence.

This activity needs JavaScript.

Why this matters next Bayes is the direct ancestor of the naive Bayes classifier you'll build in Course 3, and the prior/posterior idea underlies how models are trained and regularized. Random variables and expectation are the language of loss functions (an average over data is an expectation) and of every probabilistic output a model produces. The normal distribution returns in Module 7 as the backbone of statistics and evaluation.
One-sentence summary: Bayes' rule updates a prior belief into a posterior as evidence arrives, \[ P(H \mid E) \;=\; \frac{P(E \mid H)\,P(H)}{P(E)} \] and a random variable is an uncertain number whose expectation \( E[X] = \sum_i x_i \, P(X = x_i) \) is its probability-weighted average.

Next: Vectors & Data →