Skip to 0 minutes and 15 secondsDeep learning basics Today we can hear deep learning everywhere. Now let’s see what is deep learning. Deep learning is based on artificial neural networks, which is inspired by biological neural networks. Here is a picture of a neuron. A neuron is a cell that carries electrical impulses.
Skip to 0 minutes and 36 secondsEach neuron consists of three parts: a cell body, dendrites and a single axon. Dendrites are the branches of neurons that receive signals from other neurons and pass the signals into the cell body. Axon can be over a meter long in humans and pass electrical signal to other neurons. If the signals received are strong enough and reach an action potential, the neuron will be activated. Inspired by the biological neuron, Frank Rosenblatt has developed the first prototype of neuron called Perceptron in 1957. Perceptron uses weighted sum to represent Dendrites and a threshold to control the action potential. There were no computers these days, Dr. Rosenblatt designed a hardware device to implement the function.
Skip to 1 minute and 29 secondsAlthough the idea of perceptron is very similar to the neurons in today’s deep learning, Rosenblatt didn’t develop a mechanism to train multi-layer neural networks. In 1969, Marvin Minsky, founder of the MIT AI Lab, has published a book called perceptrons and concluded that neural networks are dead. He argued that perception cannot be used to learn a simple Boolean function XOR, because XOR is not linearly separable. This publication has caused the first AI winter, and neural networks has widely been rejected by major machine learning conferences. The winter for neural networks has been continued for more than a decade. The hero came to rescue is Geoffrey Hinton, who showed that the XOR can be learnt by using multi-layer perceptrons using backpropagation.
Skip to 2 minutes and 28 secondsAlthough the idea has been conceived by other researchers before, it is Hinton’s paper that clearly addressed the problems proposed by Minsky. How backpropagation works? Let me first introduce how neural network works. As we know,
Skip to 2 minutes and 45 secondsthere are two stages in machine learning algorithms: training and inference. For inference, we make predictions by calculating parameters starting from the input layer. This process is called the forward pass. The predicted output is compared with the label to calculate the error. The we propagate the error back to the neurons and adjust the weights. This process is called backward pass. So what is the magical math formula used for backpropagation?
Skip to 3 minutes and 22 secondsIt turns out the age-old calculus: the chain rule. Of course, to run backpropagation on modern neural networks requires advanced calculus skills like building computational graphs. Fortunately, the open-sourced deep learning framework like Caffe, TensorFlow, PyTorch, CNTK will do the works for us. We don’t need to worry about the details. Gradient descent is the most used learning algorithm. It is a first-order iterative optimization algorithm for minimizing a function. To find a local minimum of the loss function, we can take steps proportional to the negative of the gradient at the current point. The procedure is similar to finding the deepest point in a valley.
Skip to 4 minutes and 14 secondsOn the other hand, the algorithm to find a local maximum of that function using positive gradient is called gradient ascent. In this example, the cost function is simple and the global minimum can be easily found. This is also called a convex function. However, in complex high-dimension vector space, it is not guaranteed to find the local minimum. The good news is that, researcher found that there are many local minimums that almost as good as global minimum, so we are not necessary to search for the global minimum.
Deep Learning Basics: Part1
In this video, Prof. Lai explains the concept of Deep learning.
Deep learning is based on an artificial neural network, which is inspired by biological neural networks. Prof. Lai first introduced the historical background of deep learning.
Then he will explain certain topic: How backpropagation works? And what is Chain rules? What is Gradient Descent?