← All Math Foundations modules

Module 5 — Matrices & Transformations

Pillar 2 · Linear algebra · hands-on · about 30 minutes.

A single data point is a vector; a whole dataset is a grid of numbers — a matrix. But matrices are also verbs: multiplying by one transforms data — rotating, scaling, mixing features. Every layer of a neural network is exactly one such transformation. This module makes matrix multiplication concrete and shows why it is the neural-network layer.

A matrix is a grid — of data, or of a transformation

A matrix is a rectangle of numbers with rows and columns. Two readings:

Matrix × vector: a stack of dot products

To multiply a matrix by a vector, take the dot product of each row with the vector (the move you built in Module 4). Each row produces one number; stack them and you get the output vector:

\[ \begin{bmatrix} a & b \\ c & d \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} ax + by \\ cx + dy \end{bmatrix} \]

The visualizer below lets you set the matrix and watch what it does to a vector — and to a whole grid of points. Try the rotation, scale, and shear presets to see the transformation.

This activity needs JavaScript. The lesson below still covers everything.

Matrix × matrix: chaining transformations

Multiplying two matrices means applying one transformation after another. The entry in row \( i \), column \( j \) of the result is the dot product of row \( i \) of the first with column \( j \) of the second. The inner dimensions must match — an \( m\times n \) times an \( n\times p \) gives an \( m\times p \). Build one entry at a time below and watch the row-meets-column pattern.

This activity needs JavaScript.

Two special matrices: identity and transpose

AI anchor — a neural-network layer is a matrix multiply A neural-network layer takes an input vector \( \mathbf{x} \), multiplies it by a weight matrix \( W \), adds a bias \( \mathbf{b} \), and applies a squashing function: \( \mathbf{h} = f(W\mathbf{x} + \mathbf{b}) \). The matrix \( W \) is exactly the transformation you're dragging — each output number is one row of \( W \) dotted with the input. Training a network is searching for the weight matrices that transform raw input into useful output. Stacking layers is multiplying matrices: the whole forward pass of a deep net is a chain of the multiplications you're building here.

Trace the multiply

Predict entries and dimensions of matrix products. You'll get a score.

This activity needs JavaScript.

Why this matters next Matrix multiplication is the single most-executed operation in modern AI — every forward pass, every attention head, every embedding lookup is a matmul. The transformations you're seeing in 2-D are what GPUs do millions of times per second in hundreds of dimensions. In Course 4 you'll build a neural-network layer and recognize it instantly: it's \( W\mathbf{x} + \mathbf{b} \), the thing on this page.
One-sentence summary: a matrix is both a grid of data and a transformation; multiplying matrix-by-vector is a stack of row-with-vector dot products that rotates/scales/mixes the input, and that operation — \( W\mathbf{x} + \mathbf{b} \) — is precisely one layer of a neural network.

Next: Derivatives & Gradient Descent →