Want to keep learning?

This content is taken from the The Open University & Persontyle's online course, Advanced Machine Learning. Join the course to learn more.

Skip to 0 minutes and 1 second OK, so here we’re just going to have a look at some examples of the effects of L2 regularisation. Now, polynomial regression gives us a very interpretable way of showing the effects of L2 regularisation. And what I’ve actually done is I’ve manually coded a basic polynomial regression class. So if you go down to the bottom of the tidbits L2 regularisation class, you’ll see an entire implementation of the polynomial regression model class that you could use if you want if you’re interested in seeing how to actually implement statistical models NI. And it’s an example of S-3 object oriented programming NI. But what we’re actually going to do with that is just get a demo of the effects of L2 regularisation.

Skip to 0 minutes and 59 seconds All we have to do is run this L2 reg demo. Now, if you want to run it on your own computer, you’ll have to source all the code down here first. So just to give you a simple example of how do that, you just need to highlight it all and run.

Skip to 1 minute and 22 seconds That done, we can run the demo. What we’re seeing straight away is a sixth order polynomial regression model fitted to a very small amount of data. Now, we’re seeing a model that is overfitted.

Skip to 1 minute and 45 seconds One thing that we could do to reduce the overfitting of course, is to reduce the order of the polynomial regression model. Here, we go from sixth order to fourth order, and we see it gets smoother. And in fact, knowing the function that it’s generated from, this is starting to look a pretty good model. But there’s another way that we can reduce the complexity without reducing the order of the model. Now, when we reduce the order of the model we essentially reduce the number of parameters in the model because polynomial regression– the number of parameters in the model is directly related to the order of the model, order of the polynomial. And we’ve gone from sixth order to fourth order here.

Skip to 2 minutes and 27 seconds We’ve thrown away the parameters corresponding to the fifth and sixth order. Instead of throwing away these parameters, we could use L2 regularisation to restrict their ability to freely take on any given value. And here, for example, is a sixth order polynomial regression model with a lambda 0.01– an L2 regularisation parameter of 0.01. Now, we see the original with no regularisation. We see the effect of changing to 0.01 regularisation. And it’s got much smoother. In fact, it’s looking quite similar now to the fourth order model, and we’ve done this without reducing the number of parameters. Instead, we’ve regularised the parameters.

Skip to 3 minutes and 17 seconds And as we increase the L2 regularisation penalty, the regression curve generated by the model is going to get smoother and smoother. Now, eventually if we increase it enough, it degenerates obviously. It’s starting to suck all the parameters close to 0. We can see that there is a sweet spot where adding a degree of regularisation will, in fact, reduce the overfitting to an optimal amount. We’re moving the model we’re working with from right to left on the complexity versus loss graph that we saw when looking at the bias variance decomposition. And in fact, this is a really lovely example of how L2 regularisation manages to smooth the functions we’re working with, our modelling functions, whilst not reducing the number of parameters.

Skip to 4 minutes and 20 seconds Here, again, sixth order– no regularisation. Fourth order– no regularisation. Sixth order– some regularisation. And then more and more regularisation. OK, now we’ll go on and see what this is going to do– how to use L2 regularisation ourselves in an exercise.

L2 Regularization

A visual examination at the effects of L2 Regularization: How increasing L2 regularization ‘smooths’ the associated model-function, and how this should be understood in terms of the bias-variance decomposition.

This is not a video-exercise step - there is no optional associated code exercise which you are able to complete. You can, though, look at the code used in this video. It can be found in the Tidbits L2 Regularization.R file. In particular, students interested in R programming can look at the custom implementation of polynomial regression models as an example of how basic machine learning statistical models can be implemented in R using S3 style object-oriented programming.

Share this video:

This video is from the free online course:

Advanced Machine Learning

The Open University