← All Neural Networks & Deep Learning modules

Module 2 — Activation Functions

The building block · hands-on · about 25 minutes.

In Module 1 every neuron ended with a squash, \( \sigma \). That squash — the activation function — looks like a small detail. It is in fact the single most important reason deep networks work at all. This module shows you the common activations, and then proves, with a slider you can flip, why a network without one is powerless.

The three you’ll see everywhere

Plot each one and drag the input to watch the output respond.

This activity needs JavaScript. The lesson below still covers everything.

The whole point: bending the line

Here is the deep reason activations matter. Stack two linear layers — a weighted sum feeding another weighted sum — and the result is… still just a weighted sum. A line of lines is a line. No matter how many linear layers you pile up, you can only ever draw a straight boundary. The activation is the only non-linear step, and it is what lets each layer bend what the last one produced. Bends stacked on bends are how a network carves out any shape at all.

Below, three ReLU neurons feed one output, and you set how much each contributes. Try to match the wavy target. Then flip the activation to linear — and watch your carefully-built curve snap back to a dead-straight line, no matter how you set the knobs.

This activity needs JavaScript.

Activations in Keras — read only, nothing to install
from tensorflow.keras.layers import Dense

Dense(16, activation='relu')      # hidden layer — ReLU is the usual default
Dense(16, activation='tanh')      # tanh: zero-centered alternative
Dense(1,  activation='sigmoid')   # output: a 0–1 probability for yes/no

Remove the activation from every layer and the whole network collapses to one linear model — exactly what the slider above demonstrates.

AI anchor — the boom ReLU unlocked For years, deep networks were hard to train because sigmoid and tanh "saturate" — their slope goes nearly flat for large inputs, so the learning signal vanished in deep stacks. The almost-too-simple ReLU fixed this: its slope is a constant 1 for positive inputs, so gradients flow even through many layers. That single change, plus more data and faster GPUs, is a big part of why deep learning suddenly worked around 2012. The squash you’re dragging is load-bearing history.

Check your understanding

A few questions about activations. You will get a score.

This activity needs JavaScript.

Why this matters next You now have neurons that bend. Module 3 wires a stack of them into a real network and pushes data all the way through — the forward pass — so you can watch a hidden layer turn raw inputs into a curved decision the output neuron can finally separate.
One-sentence summary: an activation function adds the non-linearity that lets each layer bend the last one’s output — without it, any stack of layers collapses into a single straight-line model, which is why every useful network has activations (ReLU, tanh, sigmoid) between its layers.

Next: The Forward Pass →