Activation Functions

Without activation functions, neural networks can only learn linear relationships. In order to fit curves, we’ll need to use activation functions.

An activation function is simply some function we apply to each of a layer’s outputs (its activation). the most common activation function is the rectifier function \(max(0, x)\).

A graph of the rectifier function. The line y=x when x>0 and y=0 when x<0, making a 'hinge' shape like '_/'.

Rectifier function

The rectifier function has a graph that’s a line with the negative part “rectified” to zero. Applying the function to the outputs of a neuron will aad a bend in the data, moving us away from simple lines.

when we attach the rectifier to a linear unit, we get a Rectified Linear Unit or ReLU. Applying a ReLU activation function to a linear unit means the output becomes \(max(0, w \dot x + b)\) which we might draw in a diagram like:

Diagram of a single ReLU. Like a linear unit, but instead of a '+' symbol we now have a hinge '_/'.
A rectified linear unit.