← All Neural Networks & Deep Learning modules

Module 6 — What Depth Buys You

What makes it deep · hands-on · about 30 minutes.

You have the whole engine now: forward, loss, backprop, update. So why not stop at one layer? Because a shallow network can only draw simple boundaries. The word deep in "deep learning" means many layers stacked, and each extra layer lets the network bend its decision boundary in more places. In this module you will hand a network a famously nasty dataset — two interleaved spirals — and watch a shallow model fail while a deeper one carves out a boundary that wraps around both arms.

Why one layer isn't enough

A single neuron draws a straight line (Module 1). One hidden layer can bend that line into a handful of folds (Module 2). But some patterns — a spiral, a checkerboard, anything where the same class shows up in many separate regions — need many folds. Stacking layers gives you that cheaply: each layer reshapes the space the next layer sees, so folds compose into curves, and curves compose into spirals.

Below, pick a depth and width, then hit Train. Start shallow (1 layer) and watch the boundary stay too simple to separate the spirals. Then add layers and rerun — the same data, the same training loop, but now the boundary can wrap.

This activity needs JavaScript. The lesson below still covers everything.

Adding depth in Keras is just adding lines — read only, nothing to install
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

model = Sequential([
    Dense(16, activation='relu', input_shape=(2,)),  # layer 1
    Dense(16, activation='relu'),                  # layer 2  ← depth
    Dense(16, activation='relu'),                  # layer 3  ← more depth
    Dense(1,  activation='sigmoid')                # output
])

Each extra Dense line is one more layer — one more chance to bend the boundary. That is literally all "going deeper" means in code; the training loop from Module 4 is unchanged.

AI anchor — depth is the whole game at scale The networks behind image recognition and language have not just three layers but dozens to hundreds, organized into repeating blocks. A large language model is a very deep stack of identical layers (called transformer blocks). Everything you are about to watch — depth letting the boundary curve where width alone cannot — is the same reason those giant models can capture structure that shallow models never could. Depth is not a detail; it is the defining idea of the field.

Check your understanding

A few questions about depth and width. You will get a score.

This activity needs JavaScript.

Why this matters next Depth is powerful — maybe too powerful. A network deep enough to wrap a spiral is also deep enough to memorize the training points and fail on new ones. Module 7 is about that danger — overfitting — and the tools that keep a network honest: weight decay, dropout, and early stopping.
One-sentence summary: "deep" means many layers stacked, and each extra layer lets the decision boundary bend in more places — so a deep network can carve out shapes (like two interleaved spirals) that a shallow one provably cannot, which is why depth, not just width, is the defining idea of the field.

Next: Training Real Networks →