Module 2 — Activation Functions
In Module 1 every neuron ended with a squash, \( \sigma \). That squash — the activation function — looks like a small detail. It is in fact the single most important reason deep networks work at all. This module shows you the common activations, and then proves, with a slider you can flip, why a network without one is powerless.
The three you’ll see everywhere
- Sigmoid — \( \sigma(z) = \frac{1}{1+e^{-z}} \). Squashes any number into \( (0,1) \). Reads as a probability; the workhorse of the output layer for yes/no problems.
- Tanh — like sigmoid but ranges \( (-1, 1) \) and is centered at zero, which often trains a little better in hidden layers.
- ReLU — \( \text{ReLU}(z) = \max(0, z) \). Brutally simple: pass positives through, zero out negatives. Cheap, doesn’t saturate for big inputs, and is the default hidden activation that powered the deep-learning boom.
Plot each one and drag the input to watch the output respond.
This activity needs JavaScript. The lesson below still covers everything.
The whole point: bending the line
Here is the deep reason activations matter. Stack two linear layers — a weighted sum feeding another weighted sum — and the result is… still just a weighted sum. A line of lines is a line. No matter how many linear layers you pile up, you can only ever draw a straight boundary. The activation is the only non-linear step, and it is what lets each layer bend what the last one produced. Bends stacked on bends are how a network carves out any shape at all.
Below, three ReLU neurons feed one output, and you set how much each contributes. Try to match the wavy target. Then flip the activation to linear — and watch your carefully-built curve snap back to a dead-straight line, no matter how you set the knobs.
This activity needs JavaScript.
from tensorflow.keras.layers import Dense Dense(16, activation='relu') # hidden layer — ReLU is the usual default Dense(16, activation='tanh') # tanh: zero-centered alternative Dense(1, activation='sigmoid') # output: a 0–1 probability for yes/no
Remove the activation from every layer and the whole network collapses to one linear model — exactly what the slider above demonstrates.
Check your understanding
A few questions about activations. You will get a score.
This activity needs JavaScript.