Module 3 — Bayes & Random Variables

Pillar 1 · Probability · hands-on · about 30 minutes.

Module 2 addressed the question "given one item of evidence, what is the probability of the event?" This module introduces the mechanism for updating a belief as evidence is observed — Bayes' rule — together with the formalism of random variables used to describe uncertain quantities. These are the foundations of probabilistic reasoning under uncertainty.

Bayes' rule: inverting a conditional probability

It is frequently the case that one conditional probability is known but the other is required. A medical test provides \( P(\text{positive} \mid \text{sick}) \) — the probability of a positive result for a sick patient. The patient, however, requires \( P(\text{sick} \mid \text{positive}) \) — the probability of being sick given a positive result. Bayes' rule relates the two:

\[ P(H \mid E) \;=\; \frac{P(E \mid H)\,P(H)}{P(E)} \]

Read it as three named pieces:

Prior \( P(H) \) — what you believed before the evidence (the base rate).
Likelihood \( P(E \mid H) \) — how well the hypothesis explains the evidence.
Posterior \( P(H \mid E) \) — your updated belief after seeing the evidence. It becomes the new prior for the next clue.

Before any evidence is observed, the appropriate belief is the prior — the base rate, and nothing further. A common error is to instead use the most salient available number. Consider: which value answers the question when no test has been performed?

This activity needs JavaScript. With no test run yet, \( P(\text{buggy}) \) is simply the base rate — 10%. The 90% and 20% are likelihoods that only matter once you observe a test result.

The base-rate fallacy

The most common error in probabilistic reasoning is neglecting the prior. For a test that is "99% accurate" applied to a disease with a prevalence of 1 in 1,000, a positive result is more often than not a false positive — because the healthy population is so much larger that even a small false-positive rate generates more false positives than true positives.

Put numbers to it. The prior is \( P(\text{sick}) = 0.001 \); the test is 99% accurate, so \( P(+ \mid \text{sick}) = 0.99 \) and the false-positive rate is \( 0.01 \). Writing Bayes' rule out in full:

\[ P(\text{sick} \mid +) \;=\; \frac{P(+ \mid \text{sick})\,P(\text{sick})}{P(+ \mid \text{sick})\,P(\text{sick}) \;+\; P(+ \mid \text{healthy})\,P(\text{healthy})} \;=\; \frac{0.99 \times 0.001}{0.99 \times 0.001 \;+\; 0.01 \times 0.999} \;\approx\; \frac{1}{11} \;\approx\; 9\% \]

Per 1,000 people, this corresponds to approximately 1 true positive against roughly 10 false positives — the low base rate dominates the result. The activity below demonstrates this: reduce the disease prevalence toward "rare" and observe that a positive test result remains predominantly incorrect.

This activity needs JavaScript. The lesson below still covers everything.

Sequential updating across multiple observations

Models typically observe many items of evidence rather than one. Bayes' rule permits these to be incorporated sequentially — the posterior from one update serves as the prior for the next. Each conditionally independent item of evidence multiplies the odds. In the activity below, add evidence one item at a time and observe the belief increase or decrease as the evidence accumulates.

This activity needs JavaScript.

Random variables: quantities with uncertain values

A random variable is a number whose value is uncertain — written with a capital letter such as \( X \). A fair die is the classic example: its outcome is \( X \in \{1, 2, 3, 4, 5, 6\} \), with \( P(X = k) = \tfrac{1}{6} \) for every face \( k \). Other random variables: whether an email is spam, \( X \in \{0, 1\} \); tomorrow's temperature, \( X \in \mathbb{R} \). We summarize one with its expectation \( E[X] \): the long-run average, computed by weighting each value by its probability.

\[ E[X] \;=\; \sum_i x_i \, P(X = x_i) \]

For the die, \( E[X] = \sum_{k=1}^{6} k \cdot \tfrac{1}{6} = \dfrac{1+2+3+4+5+6}{6} = 3.5 \). No single roll is ever \( 3.5 \) — it is the value the average of many rolls settles toward. Roll it below and watch the running average converge.

This activity needs JavaScript. The lesson below still covers everything.

Two distributions show up constantly in ML:

Bernoulli — a single yes/no trial with probability \( p \) of "yes." Every binary label (spam / not-spam, click / no-click) is a Bernoulli variable. A classifier's output is an estimate of that \( p \).
Normal (the bell curve) — the distribution of things that pile up around an average: heights, measurement noise, the errors a model makes. You'll meet it again in Module 7.

This activity needs JavaScript.

AI anchor — naive Bayes and adaptive testing A naive Bayes spam filter begins with the prior P(spam) and successively multiplies in the likelihood of each observed word — "free," "invoice," "meeting" — updating the posterior word by word, as in the sequential-updating activity above. (The term "naive" refers to the assumption that the words are conditionally independent, which yields strong performance in practice despite being violated.) The same mechanism underlies adaptive testing: QuantegyAI's engine maintains a belief about a student's ability and updates it after each question — a correct answer is evidence of greater ability, an incorrect answer evidence of lesser ability — and selects each subsequent question to reduce the uncertainty in that belief most rapidly.

Check your understanding

These questions emphasize accounting for the base rate before drawing conclusions from any item of evidence.

This activity needs JavaScript.

Why this matters next Bayes is the direct ancestor of the naive Bayes classifier you'll build in Course 3, and the prior/posterior idea underlies how models are trained and regularized. Random variables and expectation are the language of loss functions (an average over data is an expectation) and of every probabilistic output a model produces. The normal distribution returns in Module 7 as the backbone of statistics and evaluation.

One-sentence summary: Bayes' rule updates a prior belief into a posterior as evidence arrives, \[ P(H \mid E) \;=\; \frac{P(E \mid H)\,P(H)}{P(E)} \] and a random variable is an uncertain number whose expectation \( E[X] = \sum_i x_i \, P(X = x_i) \) is its probability-weighted average.

Next: Vectors & Data →