Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. New subscribers only. T&Cs apply

Find out more

Activation functions

A short article describing the use of activation functions in convolutional neural networks

A quick overview of the important activation functions used in deep learning, in particular ReLU.

Much of the power of deep learning networks comes from non-linearity. Loosely speaking, non-linearity is any relationship where the response of an output in response to an input can’t be represented by a straight line. Though neural networks and CNNs are complex and multidimensional in structure all the relationships between node or neurons are linear in nature. You just multiply the values by the weights of all the connections and add them up. On it’s own this is not enough to yield the useful results we want from deep learning, so this is where activation functions come in.

Activation functions

So what is an activation function? All it is a mathematical relationship, or function, of one number (the input, or X) to another (the output, or Y). You can think of it as the way an output changes as an input increases. An example of an activation function is just a straight line y = mx +c (with m the gradient and c a constant), but since we want to introduce some non-linearity to the system this is no good to us.

An example of an activation function used in machine learning is the sigmoid function, often denoted as (sigma):

(sigma(x) = frac{1}{1+e^{-x}}),

Which looks like this:

A 2D line plot of the sigmoid function against the variable X, ranging from -10 to 10. It it roughly a slanted S shape, with the value of the function on the y-axis near to zero for higher negative X values, and near to one for higher positive X values. At X equals zero the function equals 0.5.

Don’t worry about the formula, the important thing is the shape. For most values below zero the function takes the value zero, and for most values above zero it takes the value one. Only values in a small range around zero give intermediate values.

Another activation function sometimes used is the hyperbolic tangent function, or tanh:

(tanh(x) = frac{e^x-e^{-x}}{ e^x+e^{-x }}),

Which looks like this:

A 2D line plot of the hyperbolic tangent function against the variable X, ranging from -10 to 10. The plot has a similar shape to the sigmoid function, except that the range of values the function takes goes from -1 to +1, crossing both the X and Y axes at zero.

Again, don’t worry about the formula, the important thing to notice is the shape. It’s similar to the sigmoid function, but rather than varying between zero and one, it varies from minus one to one.

ReLU

You might encounter the tanh or sigmoid functions elsewhere if you’re studying machine learning more broadly, but the most important activation function for deep learning is probably the ReLU function. This stands for Rectified Linear Unit, which sounds complicated but it actually quite simple. It just returns zero for values of X below zero, and X for values above zero. So it’s like an IF statement that gives you the same number back for positive numbers, and zero back for negative numbers.

You can express this more formally as follows:

(begin{equation} ReLU(X) = begin{cases} X & text{if $X geq 0$}\ 0 & text{otherwise} end{cases} end{equation}),

And if you plot it, it looks like this:

A 2D line plot of the ReLU function against the variable X, ranging from -10 to 10. For values les than zero, the function is always zero. For values greater than zero the function is a straight line equal to the value of X.

Most of the CNNs we will look at in this course will make regular use of ReLU activation functions, in particular after convolutional or fully connected layers, and rather than being calculated for a single number as in the plots above, it will be applied to large arrays of numbers at the same time.

If you think about it, all its really doing is taking an array of numbers, which may be positive or negative, setting all the negative number to zero, and leaving the others unchanged.

Activation functions in CNNs

We’ll see how activation functions are used within CNNs later this week. Before that though, we need to take a closer look at tensors as we will do in the next activity.

Images (c) The University of Nottingham

This article is from the free online

Deep Learning for Bioscientists

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now