Skip main navigation

New offer! Get 30% off your first 2 months of Unlimited Monthly. Start your subscription for just £29.99 £19.99. New subscribers only. T&Cs apply

Find out more

Backpropagation of Deep Neural Networks

In this video, Dr Alex Tse will introduce the backpropagation algorithm, which is an efficient method to compute the gradients of DNNs.

In Step 3.9, you learn three gradient descent-based methods for optimising model parameters in supervised learning. Hence for training a deep neural network, the problem boils down to how to compute the gradients efficiently. In this video, Dr Alex Tse will introduce the backpropagation algorithm, which enables an efficient calculation of the gradient by exploiting the hierarchical structure of DNNs.

Let (h^{(L)}) denote the output layer of DNN with model parameters ({color{blue}theta:= (W^{(l)}, b^{(l)})_{l}}). Recall that the loss function (L(theta vert mathcal{D})) to be minimized is in the additive form that (L(theta vert mathcal{D}) = frac{1}{N}sum_{i = 1}^{N} Q_{theta}(x_i, y_i)) and (Q_{theta}(x, y) = Q(h^{(L)}(x)-y)).

  • Problem: Compute ({color{blue}triangledown Q_{theta}(x, y)}), i.e. (partial_{W^{(l)}_{i, j}} Q_{theta}(x, y)) and (partial_{b^{(l)}_{i}} Q_{theta}(x, y).)
  • Solution: Recursion and Chain Rule.

The proposed backpropagation algorithm of DNN is composed of two phases:

  • Forward Phase: Neural network evaluation.
  • Backward Phase: Inductive gradient computation.

We summarize the backpropagation algorithm of DNNs as follows.

Forward pass

For each (l in {0, 1, cdots, L-1}), compute

  • (z^{(l+1)}(x) = h^{(l)}(x) {color{blue}W^{(l)}} + {color{blue}b^{(l)}},)
  • (h^{(l+1)}(x) = sigma_{l+1}(z^{(l+1)}(x)).)

Backward pass

For (l = L), compute

  • (delta^{(L)} = partial_{z^{(L)}} Q_{theta}(x, y) = partial_{z^{(L)}} Q(h^{(L)}(x) – y) = Q'(h^{(L)}(x) – y)sigma_L'(z^{(L)}(x)).)

For (l = L-1: -1: 1), compute

  • (delta^{(l)} = sigma_l'(z^{(l)}(x)) odot (W^{(l)} delta^{(l+1)});)
  • (partial_{W^{(l)}_{i, j}} Q_{theta}(x, y) = h_{i}^{(l)} delta^{(l+1)};)
  • (partial_{b^{(l)}_{j}}Q_{theta}(x, y) = delta^{(l+1)}_{j}.)

Listing 1: the backpropagation algorithm of DNNs.

This article is from the free online

An Introduction to Machine Learning in Quantitative Finance

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now