Skip main navigation
We use cookies to give you a better experience, if that’s ok you can close this message and carry on browsing. For more info read our cookies policy.
We use cookies to give you a better experience. Carry on browsing if you're happy with this, or read our cookies policy for more information.

Activation Functions

As explained previously, hidden variables/nodes in a neural network are non-linear functions of the variables/nodes of the previous layer. In basic feed-forward neural networks they are typically some non-linear function, of the weighted sum of these variables in the previous layer.

Since we may have more than one hidden layer, we will refer to the jth hidden node in the ith hidden layer as . With we will refer to both hidden nodes, where and, when the input features, where we can give a general equation for all hidden nodes in a network as:

Where we recall that the weights of the weighted sum being the edge weights on the edges leading into the hidden node.

A close up of part of a neural network. The image contains a single hidden node from the first hidden layer, all edges into it from the nodes in the input layer and the three input layer nodes, where the first is a bias node. Weights are associated with the edges, and the equation for the value taken by the hidden node is given. This is simple the version of the general equation given above for this hidden node given the number of input nodes.

Given such a form, the question becomes what the non-linear function, should be. Typically the same non-linear function is chosen for all hidden nodes in a particular layer, but different layers may have different functions. We look at three common choices here.

The Logistic Function

The logistic function squashes its argument (in this case the weighted sum) between 0 and 1. At the logistic function takes the value , and it is also at this point that its gradiant is at the steepest.

A graph of the logistic function over the interval -6 to 6. It takes values close to zero for -6 and close to 1 for 6, with an S-like transition between these, centered at zero.


The sigmoid function?

The logistic function is sometimes referred to as the sigmoid function. This should be avoided, as a sigmoid function is simply.any function that possesses a S-like shape. As such both the logistic and the tanh function are examples of sigmoid functions.


The Hyperbolic Tangent Function

The hyperbolic tangent, or tanh, function is similar to the logistic function, but squashes its argument between -1 and 1. At the tanh function takes the value , and it is also at this point that its gradiant is at the steepest.

A graph of the hyperbolic tangent function over the interval -6 to 6. It takes values close to -1 for -6 and close to 1 for 6, with an S-like transition between these, centered at zero.

The Rectifier Function

Both the logistic and hyperbolic tangent functions can face the problem that for most values of their gradient is very small. In deep neural networks (networks with many hidden layers) this can lead to ‘the vanishing gradient problem’, making them difficult to successfully train. In such networks it is common to use alternative activation functions, the most popular of which is the rectifier function. The rectifier simply returns its argument, unless its argument is less than 0 in which case it returns 0.

A graph of the rectifier function over the interval -6 to 6. It takes values at 0 over the interval -6 to 0, and returns the argument as output over the interval 0 to 6.

Nodes that use the rectifier function as their activiation function are often referred to as rectified linear units, or RELU.


There are many other activation functions in use, even in basic feed forward networks of the type we are focusing on in this course. You can view a number of these in the wikipedia activation function article.

Share this article:

This article is from the free online course:

Advanced Machine Learning

The Open University

Contact FutureLearn for Support