← All Math Foundations modules

Module 1 — The Language and Notation of ML

Warm-up · hands-on · about 25 minutes.

Machine-learning math looks intimidating mostly because of its notation — the symbols. But notation is just shorthand: a compact way to write ideas you already understand. This module decodes the handful of symbols that appear in almost every ML formula, so the later modules read like sentences instead of hieroglyphics.

Nothing here is harder than "add these numbers up" or "this depends on that." We are learning the shorthand, not new math.

Decode it yourself

Each card shows a piece of notation on one side and its plain-English meaning on the other. Tap a card to flip it. Read the symbol, guess the meaning, then check.

This activity needs JavaScript. The lesson below still covers everything.

Functions: inputs in, outputs out

A function is a rule that takes an input and returns an output. We write \( f(x) \) — read "f of x" — for "the output of rule \( f \) when the input is \( x \)." The letter is just a name; \( f(x) \), \( g(t) \), \( \text{loss}(w) \) are all the same idea.

This is the whole mental model of a model: a machine-learning model is a function. You feed it an input \( x \) (an email, an image, a row of data) and it returns an output \( \hat{y} \) (read "y-hat") — its prediction. Training is the search for the version of that function that makes the best predictions.

Try the machine below: choose a rule \( f \), feed it an input \( x \), and watch the output \( \hat{y} \) come out the other side.

This activity needs JavaScript. The lesson below still covers everything.

Subscripts and indices: pointing at items in a list

Data comes in lists. A subscript is a little number that points at one item: \( x_1 \) is the first item, \( x_2 \) the second, and \( x_i \) is "the \( i \)-th item" where \( i \) is a stand-in for any position. If you have \( n \) data points, they are \( x_1, x_2, \ldots, x_n \).

Don't confuse a subscript with a power: \( x_2 \) (subscript) means "the second item"; \( x^2 \) (superscript) means "x squared." Position vs. exponent.

Slide the index \( i \) below to point at any item in the list — the label underneath each item is its subscript.

This activity needs JavaScript. The lesson below still covers everything.

Summation: the Σ just means "add them up"

The big Greek S, \( \sum \) (sigma), is the single most common symbol in ML. It means add up a list of things. The decorations tell you where to start and stop:

\[ \sum_{i=1}^{n} x_i \;=\; x_1 + x_2 + \cdots + x_n \]

Read it left to right: "start at \( i = 1 \), go up to \( n \), and add up every \( x_i \)." That's it. Move the slider below and watch the sum expand.

This activity needs JavaScript.

Logs and exponentials: why they are everywhere

Two more symbols turn up constantly: the exponential \( e^x \) and the logarithm \( \log(x) \). They are opposites — a log undoes an exponential. You do not need to compute them by hand; you need to know why ML reaches for them.

That rule — \( \log(a \times b) = \log a + \log b \) — is the whole reason logs show up. Multiply two numbers below, and watch their logs simply add:

This activity needs JavaScript. The lesson below still covers everything.

Try it: the demo below multiplies several probabilities and shows how the product collapses toward zero — while the sum of logs stays manageable.

This activity needs JavaScript.

AI anchor — reading a real loss function Here is a loss function used to train real models, the mean squared error:
\[ L \;=\; \frac{1}{n} \sum_{i=1}^{n} \left( y_i - \hat{y}_i \right)^2 \]
You can now read every piece. \( y_i \) is the true answer for example \( i \); \( \hat{y}_i \) is the model's prediction for it; \( (y_i - \hat{y}_i) \) is the error on that example; we square it so over- and under-shooting both count as positive; \( \sum \) adds those squared errors over all \( n \) examples; and \( \frac{1}{n} \) averages them. "Loss" is just a number that measures how wrong the model is — and training tries to make it small. That sentence is the spine of the whole course.

Put it together

Translate each expression to plain English. You will get a score — guessing is fine, that is how the shorthand sticks.

This activity needs JavaScript.

Why this matters next Every remaining module leans on this shorthand. Conditional probability is written \( P(A\mid B) \); a vector is \( \mathbf{x} = [x_1, x_2, \ldots] \); the dot product is a \( \sum \); a gradient is a list of slopes; and the loss above is exactly what gradient descent minimizes in Module 6. Learn to read the symbols once, and the rest of the math is mostly reading.
One-sentence summary: ML notation is compact shorthand for simple ideas — \( f(x) \) is "rule applied to input," a subscript points at a list item, \( \sum \) means "add these up," and logs appear because they turn fragile products of probabilities into stable sums.

Next: Conditional Probability →