← All Neural Networks & Deep Learning modules

Module 7 — Training Real Networks

What makes it deep · hands-on · about 30 minutes.

Module 6 showed that a deep network is powerful enough to draw almost any boundary. That power has a dark side: a network flexible enough to wrap a spiral is also flexible enough to memorize the exact training points — noise and all — and then fail on data it has never seen. This is overfitting, the single biggest practical problem in training real networks. This module shows you how to spot it and three standard tools to fight it.

Spotting overfitting: two curves, not one

The trick is to hold out some data the network never trains on — a validation set — and watch its loss separately. While training:

Three tools keep the gap small:

Below, train a deliberately oversized network on a small, noisy dataset and watch the two curves split apart. Then flip on weight decay and dropout and rerun — the gap shrinks, and the validation loss (the one that actually matters) drops.

This activity needs JavaScript. The lesson below still covers everything.

Regularization in Keras — read only, nothing to install
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.regularizers import l2
from tensorflow.keras.callbacks import EarlyStopping

model.add(Dense(32, activation='relu',
                kernel_regularizer=l2(1e-3)))   # weight decay
model.add(Dropout(0.3))                          # dropout: drop 30% of neurons

model.fit(X, y, validation_split=0.3,           # hold out 30% to watch
          callbacks=[EarlyStopping(patience=10)])  # stop when val stops improving

All three tools are one line each. The validation_split is what produces the second curve you'll watch below; the rest keep the gap between the curves from blowing open.

AI anchor — why huge models don't just memorize Models with billions of weights have more than enough capacity to memorize their training data outright. The reason they generalize instead comes down to exactly the ideas here — regularization, dropout, enormous and diverse datasets, and careful stopping — scaled up. Every team training a large model watches a validation curve and fights the same overfitting gap you are about to open and close by hand.

Check your understanding

A few questions about overfitting and regularization. You will get a score.

This activity needs JavaScript.

Why this matters next You now have everything: a neuron, activations, a network, the training loop, backprop, depth, and the tools to keep it honest. Module 8 is the capstone — you'll assemble all of it to train a network end-to-end, from random noise to a working classifier, and make the architecture and training choices yourself.
One-sentence summary: a network flexible enough to draw any boundary can memorize its training data instead of learning the real pattern (overfitting), which you spot as a growing gap between falling training loss and rising validation loss — and fight with weight decay, dropout, and early stopping.

Next: Train a Network End-to-End →