Currently set to Index
Currently set to Follow
Skip main navigation

Deep Learning History

As machine learning engineers, part of our job is to decide on the form of the function that maps inputs to outputs. We’re going to take a brief historical look at what has led to the conclusion that something called deep learning is the best option and what that means in terms of the form of the functions we should use.
Deep learning graphic - like The Matrix
© University of York
As machine learning engineers, part of our job is to decide on the form of the function that maps inputs to outputs. We’re going to take a brief historical look at what has led to the conclusion that something called deep learning is the best option and what that means in terms of the form of the functions we should use.

Feature Engineering

Usually, our raw input data (such as an audio stream or image) is very high dimensional. This means that we have many thousands or even millions of input values. Devising a function by hand that can deal with such complexity is very hard and this led researchers for several decades to focus on something called feature engineering. The idea was to handcraft, i.e. design by hand, or engineer, some features. These features are things that can be extracted from your raw data that somehow summarise or simplify your high dimensional input.

Edge Detection

For example, let’s say that you want to recognise an object in an image. You might decide that the outline of the object is a good feature to use. So, now you need a way to extract the boundary of objects from images. This task is called edge detection. Perhaps you define an edge as a location where the colour changes rapidly (i.e. as you cross the boundary from object to background, you expect a sharp change in colour). But now you start to run into problems. What threshold should you use to define when a change in colour is caused by an edge? Will that threshold always work? What about when an object is in front of a background that is the same colour as the object, so there is no obvious edge? What about objects with internal texture that will cause lots of edges to be detected within the object?
You can probably think of lots more problems. And, who’s to say object boundaries are a good feature to use anyway?
Despite its limitations, feature engineering was dominant in computer vision until the mid 2010s. Some methods were actually quite successful. For example, the Scale Invariant Feature Transform (SIFT – proposed in 1999) was a way of finding interesting points in an image and then describing them in a way that was very distinctive (so it could be found again in another image). The approach is still quite competitive with state-of-the-art techniques for some problems. However, even then, someone noticed in 2012 that if you took the feature descriptors produced by SIFT and applied a square root to the values, performance on many tasks improved by about 5%! This situation is clearly somewhat ridiculous. Why square root? And why SIFT in the first place? How can you be sure that there isn’t some small modification you could make to your features that would boost performance on the task you’re trying to solve?

End-to-End Learning

This argument motivates the idea of end-to-end learning. The idea is that you will learn in one go the entire mapping from raw input data to final output without hand engineering any of the features used to solve the problem. Hence, the machine learning algorithm will have to learn low level features (things that can be immediately calculated from the raw input), mid level features (more abstract concepts that arise out of a combination of low level features, probably with some invariance to unimportant sources of variation) and high level features (abstract descriptions of the contents of the input data). Taking the example of images, a low level feature might be an edge, a mid level feature an eye and a high level feature the identity of a face in an image. But remember – all of this will be learnt – you don’t design any of these features in advance.

Deep Learning

It’s clear that the function we’re trying to learn is going to be complicated. How on earth can an image be mapped to the identity of the face in the image? It turns out that the best way to construct functions of sufficient complexity is to build them out of a composition of lots of simple functions applied one after another. So, our overall function (f_w(x)) is defined as:
[f_w(x) = f^n_{w_n}(dots f^2_{w_2}(f^1_{w_1}(x)))]
This means we first apply (f^1) to (x) and this function has its own parameters (w_1). Then, we apply (f^2) to the result of (f^1), then (f^3) to that result and so on. The “deep” in deep learning refers to the application of many functions one after another to the input. This depth turns out to provide immense power to the overall function in terms of what it can represent.

References

  1. Lowe, David G. “Distinctive image features from scale-invariant keypoints.” International journal of computer vision 60.2 (2004): 91-110.
  2. Arandjelović, Relja, and Andrew Zisserman. “Three things everyone should know to improve object retrieval.” 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012.
© University of York
This article is from the free online

Intelligent Systems: An Introduction to Deep Learning and Autonomous Systems

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education