Module 7 — Training Real Networks
Module 6 showed that a deep network is powerful enough to draw almost any boundary. That power has a dark side: a network flexible enough to wrap a spiral is also flexible enough to memorize the exact training points — noise and all — and then fail on data it has never seen. This is overfitting, the single biggest practical problem in training real networks. This module shows you how to spot it and three standard tools to fight it.
Spotting overfitting: two curves, not one
The trick is to hold out some data the network never trains on — a validation set — and watch its loss separately. While training:
- Training loss keeps falling — the network is memorizing the data it can see.
- Validation loss falls at first, then turns back up — the network is now memorizing noise that doesn't generalize.
- The gap between the two curves is overfitting. A small gap = healthy. A wide, growing gap = memorizing.
Three tools keep the gap small:
- Weight decay (L2) — gently pulls every weight toward zero, so the network prefers simpler, smoother boundaries.
- Dropout — randomly switches off some neurons each step, so no single neuron can carry a memorized fact.
- Early stopping — just stop training at the moment validation loss bottoms out, before it turns up.
Below, train a deliberately oversized network on a small, noisy dataset and watch the two curves split apart. Then flip on weight decay and dropout and rerun — the gap shrinks, and the validation loss (the one that actually matters) drops.
This activity needs JavaScript. The lesson below still covers everything.
from tensorflow.keras.layers import Dense, Dropout from tensorflow.keras.regularizers import l2 from tensorflow.keras.callbacks import EarlyStopping model.add(Dense(32, activation='relu', kernel_regularizer=l2(1e-3))) # weight decay model.add(Dropout(0.3)) # dropout: drop 30% of neurons model.fit(X, y, validation_split=0.3, # hold out 30% to watch callbacks=[EarlyStopping(patience=10)]) # stop when val stops improving
All three tools are one line each. The validation_split is what produces the second curve you'll watch below; the rest keep the gap between the curves from blowing open.
Check your understanding
A few questions about overfitting and regularization. You will get a score.
This activity needs JavaScript.