Skip main navigation

Shallow Neural Networks

shallow neural networks

Shallow neural network is the simpliest neural network architecture as shown in the below figure. We introduce the mathematical formulation of shallow neural network architecture and in particular, explain the building block of neural network. After reading this article, you will learn what the shallow neural network is and how to interpret the illustrative figures of neural networks as follows.

Model architecture

Shallow neural network is a non-linear model to define a mapping from the input space (mathbb{R}^d) to the output space (mathbb{R}^{e}). It is composed of one input layer (h^{(0)}), one hidden layer (h^{(1)}) and the output layer (h^{(2)}). The (l^{th}) layer is defined as a transformation from (mathbb{R}^d) to (mathbb{R}^{n_l}), where (n_l) denotes the number of neurons in the (l^{th}) layer and (l in {1, 2}).

  • Input Layer ((h^{(0)}: mathbb{R}^{d} rightarrow mathbb{R}^{d})): (forall x in mathbb{R}^{d}),

[x= (x^{(1)}, x^{(2)}, cdots, x^{(d)}) mapsto x,]

where the input layer (h^{(0)}) is the identity map, and neurons at the input layer are input (x). – Hidden Layer ((h^{(1)}:mathbb{R}^{d} rightarrow mathbb{R}^{n_{1}})): (forall x in mathbb{R}^{d}),

[z^{(1)}(x) = x W^{(1)} + b^{(1)},\ h^{(1)}(x) = sigma_{1}(z^{(1)}),]

where (W^{(1)}) is a (d times n_1) matrix of weights, (b^{(1)}) is a (n_1)-dimensional vector, (sigma_{1}) is called activation function at the hidden layer, and it is applied in the elementwise sense. – Output Layer ((h^{(2)}: mathbb{R}^{d} rightarrow mathbb{R}^{e})): (forall x in mathbb{R}^{d}),

[z^{(2)}(x) = h^{(1)}(x) W^{(2)} + b^{(2)},\ h^{(2)}(x) = sigma_{2}(z^{(2)}(x)),]

where (W^{(2)}) is a (n_{1} times n_{2}) matrix of weights, (b^{(2)}) is a (n_2)-dimensional vector, (sigma_{2}) is called the activation function at the output layer, which is often chosen as the identity map for the regression problem, while it is used as the softmax function for the classification problem.

Neural network building block

Let us focus on the building block of neural networks and its graphic illustration. For simplicity, we consider the hidden layer and the scalar output (h_1 in mathbb{R}).

Let (W_{i, 1}) denote the linear weights between the incoming (i^{th}) node at the input layer and the outgoing node at the hidden layer.

The hidden neuron (h_1) is obtained by applying the activation function (sigma) to the intermediate variable (z_1), i.e., (z_{1} = sum_{ i = 1}^{n_{1}} W_{i, 1} x^{(i)} + b_{1}, h_{1} = sigma(z_{1})). Here (W, b) are model parameters to be learned from data, and usually not

For simplicity, we often ignore the model weights in the graphic illustration of neural networks. Let us recall the first figure of a shallow neural network. Now you should be able to write down the precise mathematical definition of the neural network models in the figure and compute the number of model parameters. Check out whether you are able to calculate the total number of the model parameters ((3+1) times 4+ (4+1) times 2 = 26). Think about the general formula to compute the total number of model parameters for artificial neural networks.

This article is from the free online

An Introduction to Machine Learning in Quantitative Finance

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now