New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only. T&Cs apply

# Activation functions

A short article describing the use of activation functions in convolutional neural networks

A quick overview of the important activation functions used in deep learning, in particular ReLU.

Much of the power of deep learning networks comes from non-linearity. Loosely speaking, non-linearity is any relationship where the response of an output in response to an input can’t be represented by a straight line. Though neural networks and CNNs are complex and multidimensional in structure all the relationships between node or neurons are linear in nature. You just multiply the values by the weights of all the connections and add them up. On it’s own this is not enough to yield the useful results we want from deep learning, so this is where activation functions come in.

## Activation functions

So what is an activation function? All it is a mathematical relationship, or function, of one number (the input, or X) to another (the output, or Y). You can think of it as the way an output changes as an input increases. An example of an activation function is just a straight line y = mx +c (with m the gradient and c a constant), but since we want to introduce some non-linearity to the system this is no good to us.

An example of an activation function used in machine learning is the sigmoid function, often denoted as (sigma):

(sigma(x) = frac{1}{1+e^{-x}}),

Which looks like this:

Don’t worry about the formula, the important thing is the shape. For most values below zero the function takes the value zero, and for most values above zero it takes the value one. Only values in a small range around zero give intermediate values.

Another activation function sometimes used is the hyperbolic tangent function, or tanh:

(tanh(x) = frac{e^x-e^{-x}}{ e^x+e^{-x }}),

Which looks like this:

Again, don’t worry about the formula, the important thing to notice is the shape. It’s similar to the sigmoid function, but rather than varying between zero and one, it varies from minus one to one.

## ReLU

You might encounter the tanh or sigmoid functions elsewhere if you’re studying machine learning more broadly, but the most important activation function for deep learning is probably the ReLU function. This stands for Rectified Linear Unit, which sounds complicated but it actually quite simple. It just returns zero for values of X below zero, and X for values above zero. So it’s like an IF statement that gives you the same number back for positive numbers, and zero back for negative numbers.

You can express this more formally as follows:

(begin{equation} ReLU(X) = begin{cases} X & text{if $X geq 0$}\ 0 & text{otherwise} end{cases} end{equation}),

And if you plot it, it looks like this:

Most of the CNNs we will look at in this course will make regular use of ReLU activation functions, in particular after convolutional or fully connected layers, and rather than being calculated for a single number as in the plots above, it will be applied to large arrays of numbers at the same time.

If you think about it, all its really doing is taking an array of numbers, which may be positive or negative, setting all the negative number to zero, and leaving the others unchanged.

## Activation functions in CNNs

We’ll see how activation functions are used within CNNs later this week. Before that though, we need to take a closer look at tensors as we will do in the next activity.

Images (c) The University of Nottingham