Skip to 0 minutes and 1 second OK, so here we are looking at the regularisation, or the L2 regularisation, code exercise. Once again, you’ll find that in the regularisation L2 dot power script. And you can try to replicate what’s being done. First section of code is just generating the data we’re going to work with. So I’ll highlight and run that. Once again, the data we’ll work with is stored in a data frame called data set. Now, we’re going to generate some training validation and test data.

Skip to 0 minutes and 37 seconds If you’re unfamiliar with why we’re doing this, we talk more about the reasons behind generating validation in test data, but essentially we will train our model on the training data, see how they perform on the validation data, and use that to work out which model is best. And then we’ll get a unbiased estimate of how well that final model is performing on the test data. So we’ll just randomly divide our data set into these three subsets.

Skip to 1 minute and 11 seconds Now, we’re going to be looking at the effect of changing the L2 regularisation parameter. This is normally referred to by lambda. So we generate a vector of possible lambda values. Here we go. And now I’m going to create a series of neural network models. Our variables x and y. So I’m just going to be giving the formula y tilde dot. And I’m going to be using a single hidden layer with four nodes. And I’m going to use these different– each model is going to have a different L2 regularisation parameter chosen from this vector of lambda values. So let’s produce our seven models here, I think it is.

Skip to 2 minutes and 8 seconds These are done. Now, what we’re going to do now is we’re going to see how well each of these models performs on the validation data. We’re also going to store the residuals from the models on the validation data. That’s the errors of the models. On the basis of how well the models do on the validation data, we’re going to select the best model. And we’re going to get an unbiased estimate of how good that best model is by seeing how it performs on the test data. In all these cases, we’re going to be using the mean squared error loss function.

Skip to 2 minutes and 45 seconds Then we’re going to do something a little bit more advanced that we’ll talk more about in the analysis of models section. So after we talk about this, you might want to come back and look at this video again. We’re just going to do a statistical test to see if we can be confident that the expected mean squared error of the chosen model is better– is less than a model that has no regularisation penalty. This is a typical p-test that will give us a value, a p-value, telling us how probable it is that we would see this difference of performance in the two models if in reality the two models were equally performing over infinite amounts of data, over new data.

Skip to 3 minutes and 34 seconds So let’s do this. We’re seeing how all the models perform on the validation data.

Skip to 3 minutes and 43 seconds And we’re going to get the mean squared error for how they performed on the validation data. And we’ll see which one is best and use that one, see how that one performs on the test data and get a test mean squared error as well. And finally, we’ll get p-value seeing how confident we can be that the best model is performing better–

Skip to 4 minutes and 14 seconds will perform better than a model that has no regularisation penalty at all.

Skip to 4 minutes and 21 seconds And we’ll just get an output telling us the results of all these tests. We see the best model was with the lambda value of 0.001. It had a 28.9 validation mean squared error and a 47.3 test mean squared error. Now, that’s a bit of a worrying jump. We’ll talk about why seeing a jump like that is worrying when we talk about analysing models. But we do see also that the p-value says, hey– the p-value here is 0.4. And again, we’ll talk about this when we talk about analysing models. But we can be pretty confident that the L2 regularisation really did work and help here.

Skip to 5 minutes and 3 seconds So we found the best model, the model with the lambda value 0.001. We’ll just store our best model in the model. And we’ll do a plot to give us some idea of visualising the best model. What we’re going to do– and you can go through the lines that do this yourselves– we’re generating a 3D plot that’s giving us the neural network regression surface on the inputs x1 and x2. Our output, of course, was y. The black points are the data points, and the little red lines are the errors, the residuals. So we saw that regularisation did help.

Skip to 5 minutes and 53 seconds Not only did we see that it performed best in terms of mean squared error, we even did a statistical test to show that this was not just an accident. And the statistical test came back saying, yes, we can be pretty confident that there really is a statistically significant difference between the model generated with the regularisation parameter of 0.001 and one generated with no regularisation. So a lovely little example of how L2 regularisation does work.

# L2 Regularization Exercise

A code-exercise video for L2 Regularization. In this video we work through the problem found in the *Regularization L2.R* file.

In detail, we look at the effect of changing the L2 regularization tuning parameter on basic neural network models as part of a simple regression problem on synthetic data.

We divide the data into training, validation and test data before then training a series of neural network models, each with different L2 regularization penalties, using the training data. We then examine how well these models perform on the validation data in order to select the best performing model based on mean squared error. We finally obtain an unbiased estimate of the performance of our chosen model by examining its performance (MSE) on the test data.

Foreshadowing topics that we will cover in the *Evaluation of Statistical Models* step in week two, we also perform a statistical test to obtain an indication of how accurate the estimates of our chosen model’s performance is, and how confident we can be that regularization improved the performance of our final model. You may want to review this video after we cover these topics in week two.

Note that the *stats*, *rgl*, *nnet* and *gdata* R packages are used in this exercise. You will need to have them installed on your system. You can install packages using the *install.packages* function in R.

© Dr Michael Ashcroft