← All Math Foundations modules

Module 4 — Vectors & Data

Pillar 2 · Linear algebra · hands-on · about 30 minutes.

Every piece of data a model sees — an image, a sentence, a customer — is turned into a vector: a list of numbers. Once data is vectors, "how similar are these two things?" becomes a geometry question with a precise answer. This module builds vectors, the dot product, and the similarity measure that lets a language model know "king" and "queen" are related.

A vector is a list of numbers — and an arrow

A vector is an ordered list: \( \mathbf{x} = [x_1, x_2, \ldots, x_n] \). Each number is a feature — a measurable property. A house might be \( [1500, 3, 2] \) (square feet, bedrooms, baths); a word in a language model might be 768 numbers capturing its meaning. With two features we can draw the vector as an arrow on a plane, which is how we'll build intuition.

Drag the two arrows below. Everything else — length, dot product, similarity — updates live.

This activity needs JavaScript. The lesson below still covers everything.

Length (norm): how big is a vector?

The norm \( \lVert \mathbf{x} \rVert \) is the arrow's length — straight from the Pythagorean theorem:

\[ \lVert \mathbf{x} \rVert \;=\; \sqrt{x_1^2 + x_2^2 + \cdots + x_n^2} \]

It measures magnitude — how far the data point sits from the origin. Models often normalize vectors (scale them to length 1) so that comparisons are about direction, not size.

The dot product: the workhorse of ML

The dot product multiplies matching entries and adds them up — a single number:

\[ \mathbf{a} \cdot \mathbf{b} \;=\; a_1 b_1 + a_2 b_2 + \cdots + a_n b_n \]

It is large and positive when two vectors point the same way, zero when they're perpendicular (unrelated), and negative when they point oppositely. That one number — a weighted sum — is literally what a single neuron computes, and what powers search, recommendations, and attention in transformers.

Cosine similarity: direction without size

To ask "do these point the same way?" while ignoring length, divide the dot product by both norms. The result is the cosine of the angle between them — cosine similarity:

\[ \cos\theta \;=\; \frac{\mathbf{a} \cdot \mathbf{b}}{\lVert \mathbf{a} \rVert\,\lVert \mathbf{b} \rVert} \]

It runs from +1 (same direction) through 0 (perpendicular) to −1 (opposite). The playground shows it as you drag — line the arrows up and watch it climb to 1.

AI anchor — how LLMs represent meaning A language model turns every word into an embedding — a vector of hundreds of numbers learned so that words used in similar contexts land in similar directions. "King" and "queen" point almost the same way; "king" and "banana" don't. The model measures relatedness with exactly the cosine similarity above. Semantic search, recommendations, and retrieval-augmented chat all rank results by cosine similarity between embedding vectors — the 2-D arrows you're dragging are the same idea in 768 dimensions.

Make the call

Given two vectors, predict their dot product's sign and roughly how similar they are. You'll get a score.

This activity needs JavaScript.

Why this matters next A row of a dataset is a vector; a whole dataset is a stack of them — a matrix (Module 5). The dot product you just built is the atom of matrix multiplication, which is how a neural-network layer transforms data. And cosine similarity is the retrieval engine behind modern RAG and recommendation systems you'll meet in Course 4.
One-sentence summary: data becomes vectors (lists of feature numbers); their length is the norm \( \sqrt{\sum x_i^2} \); the dot product \( \sum a_i b_i \) measures alignment; and cosine similarity normalizes that into an angle-based score from −1 to +1 — the way models measure "how related are these two things?"

Next: Matrices & Transformations →