Module 5 — Matrices & Transformations

Pillar 2 · Linear algebra · hands-on · about 30 minutes.

A single data point is a vector; a complete dataset is a two-dimensional array of numbers — a matrix. A matrix also represents a transformation: multiplying a vector by a matrix transforms it — rotating, scaling, or combining its features. Every layer of a neural network is precisely one such transformation. This module develops matrix multiplication concretely and establishes why it constitutes the neural-network layer.

A matrix as data and as a transformation

A matrix is a rectangular array of numbers organized into rows and columns. It admits two interpretations:

As data: each row is one example, each column one feature. Five houses with three features each is a 5×3 matrix.
As a transformation: a matrix acts on a vector to produce a new vector — stretching, rotating, or projecting it. This is the interpretation central to machine-learning models.

Matrix–vector multiplication: a collection of dot products

To multiply a matrix by a vector, compute the dot product of each row with the vector (the operation introduced in Module 4). Each row yields one component; together they form the output vector:

\[ \begin{bmatrix} a & b \\ c & d \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} ax + by \\ cx + dy \end{bmatrix} \]

As a worked example, let \( A=\begin{bmatrix} 2 & -1 \\ 1 & 3 \end{bmatrix} \) and \( \mathbf{x}=\begin{bmatrix} 3 \\ 2 \end{bmatrix} \). The dot product of row 1 with the vector gives the first component, and that of row 2 the second:

\[ \begin{bmatrix} 2 & -1 \\ 1 & 3 \end{bmatrix}\begin{bmatrix} 3 \\ 2 \end{bmatrix} = \begin{bmatrix} (2)(3)+(-1)(2) \\ (1)(3)+(3)(2) \end{bmatrix} = \begin{bmatrix} 6-2 \\ 3+6 \end{bmatrix} = \begin{bmatrix} 4 \\ 9 \end{bmatrix} \]

Step through it yourself below — each result entry lights up the matrix row and the vector it is the dot product of.

This activity needs JavaScript. Each output entry is the dot product of one row of the matrix and the vector: row 1 gives \( (2)(3)+(-1)(2)=4 \), row 2 gives \( (1)(3)+(3)(2)=9 \).

The visualizer below allows you to specify the matrix and observe its effect on a vector and on an entire grid of points. The rotation, scale, and shear presets demonstrate the corresponding transformations geometrically.

This activity needs JavaScript. The lesson below still covers everything.

Matrix × matrix: chaining transformations

Multiplying two matrices means applying one transformation after another. The entry in row \( i \), column \( j \) of the result is the dot product of row \( i \) of the first with column \( j \) of the second. The inner dimensions must match — an \( m\times n \) times an \( n\times p \) gives an \( m\times p \). Build one entry at a time below and watch the row-meets-column pattern.

This activity needs JavaScript.

Two special matrices: the identity and the transpose

The identity matrix \( I \) is the square matrix with \( 1 \)s on its main diagonal and \( 0 \)s everywhere else. In two dimensions,

\[ I = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}. \]

It is the multiplicative identity of matrix multiplication, playing exactly the role the number \( 1 \) plays for the real numbers. Just as \( 1 \cdot x = x \) for every real number \( x \), we have \( I\mathbf{x} = \mathbf{x} \) for every vector \( \mathbf{x} \), and \( IA = AI = A \) for every compatible matrix \( A \). Multiplying by \( I \) leaves its argument unchanged.

The transpose \( A^{\top} \) is formed by reflecting a matrix across its main diagonal: row \( i \) of \( A \) becomes column \( i \) of \( A^{\top} \), so the entries satisfy \( (A^{\top})_{ij} = A_{ji} \). An \( m \times n \) matrix becomes \( n \times m \). For example,

\[ A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}, \qquad A^{\top} = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix}. \]

The transpose appears constantly when aligning dimensions so that a product is defined. Watch each row swing into the matching column below.

This activity needs JavaScript. The transpose reflects a matrix across its main diagonal — row \( i \) becomes column \( i \), so a \( 2\times 3 \) matrix becomes \( 3\times 2 \).

AI anchor — a neural-network layer is, fundamentally, a matrix multiplication A neural-network layer takes an input vector \( \mathbf{x} \), multiplies it by a weight matrix \( W \), adds a bias vector \( \mathbf{b} \), and applies a nonlinear activation function: \( \mathbf{h} = f(W\mathbf{x} + \mathbf{b}) \). The matrix \( W \) is exactly the kind of transformation you are dragging above — each entry of the output is the dot product of one row of \( W \) with the input \( \mathbf{x} \). Training a network is the search for the weight matrices that map raw input to useful output. Stacking layers corresponds to multiplying matrices: the whole forward pass of a deep network is a chain of the matrix multiplications you are building here.

Step through one layer — multiply by \( W \), add the bias, then apply the activation — and watch \( \mathbf{x} \) become \( \mathbf{h} \):

This activity needs JavaScript. One layer computes \( \mathbf{h} = f(W\mathbf{x} + \mathbf{b}) \): matrix-multiply the input, add the bias, then apply the activation \( f \).

Check your understanding

Determine the entries and dimensions of the given matrix products.

This activity needs JavaScript.

Why this matters next Matrix multiplication is the single most-executed operation in modern AI — every forward pass, every attention head, every embedding lookup is a matrix multiplication. The transformations you're seeing in 2-D are what GPUs do millions of times per second in hundreds of dimensions. In Course 4 you'll build a neural-network layer and recognize it instantly: it is exactly \( W\mathbf{x} + \mathbf{b} \), the operation you built on this page.

One-sentence summary: a matrix is both a grid of data and a transformation; multiplying matrix-by-vector is a stack of row-with-vector dot products that rotates/scales/mixes the input, and that operation — \( W\mathbf{x} + \mathbf{b} \) — is precisely one layer of a neural network.

Next: Derivatives & Gradient Descent →