Module 4 — Vectors & Data

Pillar 2 · Linear algebra · hands-on · about 30 minutes.

Every input a model processes — an image, a sentence, a customer record — is represented as a vector: an ordered list of numbers. Once data is represented as vectors, the question "how similar are two items?" becomes a geometric question with a precise answer. This module develops vectors, the dot product, and the similarity measure by which a language model represents "king" and "queen" as related.

A vector as an ordered list and as a geometric object

A vector is an ordered list: \( \mathbf{x} = [x_1, x_2, \ldots, x_n] \). Each component is a feature — a measurable property. A house might be represented as \( [1500, 3, 2] \) (square feet, bedrooms, bathrooms); a word in a language model might be represented by 768 numbers encoding its meaning. With two features, a vector can be drawn as an arrow in the plane, which is used here to build geometric intuition.

Adjust the two arrows below; the length, dot product, and similarity update accordingly.

This activity needs JavaScript. The lesson below still covers everything.

Length (norm) of a vector

The norm \( \lVert \mathbf{x} \rVert \) is the length of the vector, given directly by the Pythagorean theorem:

\[ \lVert \mathbf{x} \rVert \;=\; \sqrt{x_1^2 + x_2^2 + \cdots + x_n^2} \]

It measures magnitude — how far the data point sits from the origin. Models often normalize vectors (scale them to length 1) so that comparisons are about direction, not size.

Adjust the vector's tip below. The dashed legs are \( x_1 \) and \( x_2 \); the arrow is the hypotenuse, so its length is \( \sqrt{x_1^2 + x_2^2} \). Click Normalize to scale the vector onto the unit circle — the same direction, with length 1.

This activity needs JavaScript. The norm is the arrow's length, \( \sqrt{x_1^2 + x_2^2} \); normalizing divides by that length to land on the unit circle (length 1, same direction).

The dot product

The dot product multiplies corresponding components and sums the results, yielding a single scalar:

\[ \mathbf{a} \cdot \mathbf{b} \;=\; a_1 b_1 + a_2 b_2 + \cdots + a_n b_n \]

It is large and positive when two vectors point in similar directions, zero when they are perpendicular (unrelated), and negative when they point in opposing directions. This scalar — a weighted sum — is precisely the quantity a single neuron computes, and it underlies search, recommendation, and the attention mechanism in transformers.

The geometry determines the sign: the dot product equals the projection of \( \mathbf{a} \) onto \( \mathbf{b} \), multiplied by \( \lVert \mathbf{b} \rVert \). Adjust the arrows and observe the projection change sign as the angle passes 90°.

This activity needs JavaScript. The dot product is the projection (shadow) of one vector onto the other, scaled by that vector's length — positive at acute angles, zero at 90°, negative beyond.

Cosine similarity: direction without size

To ask "do these point the same way?" while ignoring length, divide the dot product by both norms. The result is the cosine of the angle between them — cosine similarity:

\[ \cos\theta \;=\; \frac{\mathbf{a} \cdot \mathbf{b}}{\lVert \mathbf{a} \rVert\,\lVert \mathbf{b} \rVert} \]

It ranges from +1 (identical direction) through 0 (perpendicular) to −1 (opposite direction). The interactive plot above updates the value continuously; aligning the arrows drives it toward 1.

Below, set the angle and the two lengths independently. Changing the lengths leaves the cosine similarity unchanged; only the angle affects it. This is the meaning of "direction independent of magnitude."

This activity needs JavaScript. Cosine similarity depends only on the angle between two vectors, not their lengths: +1 at 0°, 0 at 90°, −1 at 180°.

AI anchor — how LLMs represent semantic meaning A language model turns every word into an embedding — a vector of hundreds of numbers learned so that words used in similar contexts land in similar directions. "King" and "queen" point almost the same way; "king" and "banana" don't. The model measures relatedness with exactly the cosine similarity above. Semantic search, recommendations, and retrieval-augmented chat all rank results by cosine similarity between embedding vectors — the 2-D arrows you're dragging are the same idea in 768 dimensions.

Check your understanding

For each pair of vectors, determine the sign of their dot product and their approximate similarity.

This activity needs JavaScript.

Why this matters next A row of a dataset is a vector; a whole dataset is a stack of them — a matrix (Module 5). The dot product you just built is the atom of matrix multiplication, which is how a neural-network layer transforms data. And cosine similarity is the retrieval engine behind modern RAG and recommendation systems you'll meet in Course 5 (Inside Large Language Models).

Summary: data is represented as vectors (lists of feature values); a vector's length is its norm \( \sqrt{\sum x_i^2} \); the dot product \( \sum a_i b_i \) measures alignment; and cosine similarity normalizes the dot product into an angle-based score in \( [-1, +1] \) — the standard measure by which models quantify the relatedness of two items.

Next: Matrices & Transformations →