We use cookies to give you a better experience. Carry on browsing if you're happy with this, or read our cookies policy for more information.

Skip main navigation

Early Stopping and Overfitting

A discussion of the intuitive idea behind using early stopping to avoid overfitting.
Now, we’ve already talked a little bit about overfitting. And the idea of overfitting is that you get a model that does very well on the training data, but is such that it’s unlikely to do well on new data. It’s not generalizable. It’s overfitting to the noise in the training data. And we can see that this red curve is doing that, as opposed to, say, the green curve. Now, in an example like this, it’s immediately obvious that overfitting is related essentially to what we call the smoothness of the function. This green function is quite smooth. This red function is not at all smooth. And when you are overfit, you’re often going to end up with a function that’s not very smooth.
Now, consider what happens when you do an optimization on a loss function.
So here we have the loss function. Here we have one dimension of parameters. And of course, as we try to optimise the loss function, we seek to minimise it. To end up moving from wherever we started down until we find along the loss surface, following the gradient, until we find the local optimum or, hopefully, the global optimum. Now, if we’re overfitting, then as we continue moving along this loss surface toward the optimal point, our function is going to get less and less smooth. We might be up here with the green one. But by the time we come down here, we’re with the red one.
Now, if this is the case, it makes sense to think, hey, if I stop before I move all the way down to the optimum of the gradient of the loss surface, then I’m likely to get a function that does not overfit. Now, this is a very ad hoc idea, this thought that, hey, if I’m in a situation where if I get the optimal point on the loss surface, I’m going to get a function that overfits, that if I stop before I get there, I might get a function that is probably going to be smoother, probably going to overfit less. It’s an intuitive idea, but it’s also very ad hoc. It might be that you get a function that’s terrible.
But it might be that you’re lucky, and you do indeed get a function that is smoother and likely to generalise. This is the idea behind early stopping.
A discussion of the intuitive idea behind using (the naive version of) early stopping to avoid overfitting. We discuss the intuitive relation between overfitting, function smoothness and model complexity, in the context of the optimization of a loss function.
This article is from the free online

Advanced Machine Learning

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education