Learn more about this course.

Loss Functions in Machine Learning

One of the key factors in machine learning is the loss function. This tells the machine learning algorithm how well the trained system is currently performing. The goal of learning is to reduce the value of this loss function, i.e. to make our machine perform better.

Supervised Learning

In supervised learning, our training data provides us with the correct or desired output – known as the label – for each corresponding input. The loss function compares the label against the output that our system currently predicts. It gives us back a non-negative number indicating the disagreement or error between the desired and predicted outputs. A loss value of zero means perfect performance.

Squared Error

Want to keep
learning?

This content is taken from
University of York online course,

Intelligent Systems: An Introduction to Deep Learning and Autonomous Systems

View Course

Loss functions come in many flavours depending on the task being solved. Perhaps the simplest loss function is the squared error. For example, consider the following system which is learning to predict the age of a person from a photograph. It outputs its current guess in years (21.3) and we compare this against the known actual age of the person in years (39.9). We take the difference (-18.6) and square it (345.96). The machine learning algorithm now knows that the system is not performing perfectly for this input and will try to adapt the machine to improve performance. We’ll start to understand how this is done below.

In practice of course, we don’t just have one training example, we have lots (usually tens of thousands or even millions for a computer vision problem). So, we need to combine the loss values for all training examples. Typically, this would mean either taking the average (giving us the mean squared error in this case) or summing them up (sum of squared errors).

Parametric Machine Learning

The “function we’re trying to learn” is the mapping from input to output. We’ll call this (f). This could be any function of any complexity. You may have studied linear functions or quadratic functions in mathematics. These would be possible choices, though much too simple to work well on any serious problem. You will have seen many more complicated mathematical functions such as trigonometric functions, exponentials or higher order polynomials. Again, these might work as part of a larger, more complicated function. The problem is that the set of possible functions is infinite. How can we choose a suitable one?

Instead, most machine learning methods use a function of fixed form with a fixed number of parameters. The behaviour of the function is then determined by the value of those parameters. So now, instead of trying to choose a function, we reduce the problem to adapting the value of the parameters. This is called parametric machine learning. Let’s take a very simple example in which the input to our machine is a single number (x). And the output is also a single number obtained by applying the following function:

[f(x) = w_1x + w_2]

In this case, the output is a linear function of the input and the function itself depends on two parameters: (w_1) and (w_2). For real problems, many more parameters would be required. For example, it’s not unusual for an image classification network to have millions, even tens of millions of parameters. We’ll call the set of all of the parameters of the network (mathbf{w}). And we’ll write the function that depends on (mathbf{w}) as (f_{mathbf{w}}).

Learning = Optimisation

Now we come to a slightly anticlimactic discovery. When we talk about machine learning, all we really mean is adjusting the parameters of our function in order to reduce the loss, averaged over our whole set of training data, until we reach a point where it can’t be reduced any further. Adjusting parameters to find the minimum of a function is called optimisation.

We can start to write this down mathematically. Let’s say we have lots of training examples, each comprising an input (x_i) and a corresponding desired output (y_i) (i.e. our label for that input). The prediction our machine makes for input (x_i) is (f_{mathbf{w}}(x_i)). Therefore our loss function should compare (f_{mathbf{w}}(x_i)) and (y_i). We’ll write this as (E(f_{mathbf{w}}(x_i),y_i)), where (E) is our loss function, i.e. “error”. But remember, we have lots of training examples, not just one input/label pair. Let’s do this simplest thing possible and just add up the loss over all of our (n) training examples:

[sum_{i=1}^n E(f_{mathbf{w}}(x_i),y_i)]

You should have seen the “sigma notation” used here in your maths before, but in case you haven’t, it’s just a compact way of writing:

[E(f_{mathbf{w}}(x_1),y_1) + E(f_{mathbf{w}}(x_2),y_2) + dots + E(f_{mathbf{w}}(x_n),y_n)]

Finally, we can write down the goal of machine learning:

Find the w that makes the following as small as possible:

[sum_{i=1}^n E(f_{mathbf{w}}(x_i),y_i)]

As machine learning engineers, it’s our job to choose the loss function, the form of (f) and decide what to use as our training data (inputs and labels).

Want to keep learning?

This content is taken from University of York online course

Intelligent Systems: An Introduction to Deep Learning and Autonomous Systems

View Course

See other articles from this course

This article is from the free online

Intelligent Systems: An Introduction to Deep Learning and Autonomous Systems

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

Loss Functions in Machine Learning

Supervised Learning

Squared Error

Want to keep
learning?

Intelligent Systems: An Introduction to Deep Learning and Autonomous Systems

Parametric Machine Learning

Learning = Optimisation

Want to keep learning?

Intelligent Systems: An Introduction to Deep Learning and Autonomous Systems

Intelligent Systems: An Introduction to Deep Learning and Autonomous Systems

Intelligent Systems: An Introduction to Deep Learning and Autonomous Systems

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Learn more about this course.

Loss Functions in Machine Learning

Supervised Learning

Squared Error

Want to keep learning?

Intelligent Systems: An Introduction to Deep Learning and Autonomous Systems

Parametric Machine Learning

Learning = Optimisation

Want to keep learning?

Intelligent Systems: An Introduction to Deep Learning and Autonomous Systems

Share this

Intelligent Systems: An Introduction to Deep Learning and Autonomous Systems

Intelligent Systems: An Introduction to Deep Learning and Autonomous Systems

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Want to keep
learning?