New offer! Get 30% off your first 2 months of Unlimited Monthly. Start your subscription for just £35.99 £24.99. New subscribers only T&Cs apply

# Multilayer Perceptrons

We’re now ready to introduce the concept of a neural network. You probably realise that “neural” means these networks are somehow connected to the human brain. While it’s true that the initial inspiration for neural networks was indeed the human brain, the model we’re actually going to use is very much simpler than a real brain.

## Neural Networks

We’re now ready to introduce the concept of a neural network. You probably realise that “neural” means these networks are somehow connected to the human brain. While it’s true that the initial inspiration for neural networks was indeed the human brain, the model we’re actually going to use is very much simpler than a real brain.

The basic building block of a neural network is a neuron. You may have heard of neurons in the brain. We know that the human brain is made up of 10s of billions of neurons connected together in a network. A biological neuron is a type of cell that receives multiple inputs via parts called dendrites. Some kind of processing takes place inside the neuron to decide whether it should “fire” – which means send an output signal along its axon. Most of the time, the axon is connected to a dendrite of another neuron, making up the network.

## Artificial Neurons

An artificial neuron is a very simplified model of a biological neuron. Again, it has multiple ((n)) input values. Each input value ((x_i)) is multiplied by a different weight ((w_i)). These scaled input values are then added together and we add on a constant called the bias ((b)). Finally, a function (usually a nonlinear function) is applied to this summed value and that provides the single output value of the neuron.

Mathematically, we can write down the behaviour of a neuron as:

[y = f(b + sum_{i=1}^n w_ix_i)]

The weights and bias are the parameters of the neuron. These are the values we can adjust to try to reduce the value of our loss function when we are training.

The function (f) is called the activation function. Lots of different functions have been tried and give quite different behaviour. The original neuron model was proposed all the way back in 1958! They called it the perceptron and used the following activation function:

(f(z)=0) if (z <0, 1) otherwise

This is a function that outputs zero if its input is negative, otherwise it outputs one. Originally, this single neuron perceptron was used directly for classification, with the zero or one output indicating the two possible classes. However, this model is extremely limited and cannot solve even some very trivial problems.

Two key further developments were required. First, it was quickly realised that a single neuron is not powerful enough. Secondly, having an activation function that jumps from 0 to 1 proves problematic for training.

## Multilayer Perceptron

Let’s see first how to combine many neurons into a neural network. A network has one or more input values. Each of these inputs are connected to every neuron in the first hidden layer. The output of every one of these neurons is connected to the inputs of all the neurons in the second hidden layer. After a chosen number of hidden layers, we finally have an output layer, usually comprising only one neuron and which has as its input the output from all neurons in the last hidden layer. We call this a multilayer perceptron (MLP) or fully connected network. Here’s an example:

## Feedforward Networks

This MLP has four inputs, two hidden layers (the first of which has eight neurons, the second of which has four) and a single output. Each neuron applies the function discussed above and has its own weights and bias. This type of network is called a feedforward network since it has no loops (i.e. the output of a neuron never connects to the input of a neuron in an earlier layer). This is different to the brain but makes training possible. We’ll only consider feedforward networks for the rest of this course.

## Rectified Linear Unit

Lastly, we need to decide on the activation function. For many decades, researchers experimented with functions that smoothly transitioned from 0 to 1. However, since about 2011 it’s been found that the best performing choice is a very simple function called the rectified linear unit (ReLU):

(f(z)=0) if (z <0, z) otherwise

As you can see, this effectively stops negative numbers from being passed forward, otherwise it allows the value to pass through unchanged.

Multilayer Perceptrons (MLPs) are still very widely used and, in fact, have recently been rediscovered as a state-of-the-art method for representing implicit functions. See the amazing NeRF paper linked below for a recent example.

### References

1. Rosenblatt, Frank. The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory, 1957.
2. Mildenhall, Ben, et al. “Nerf: Representing scenes as neural radiance fields for view synthesis.” European conference on computer vision. Springer, Cham, 2020. (https://www.matthewtancik.com/nerf)