So now we’re going to look at an example exercise where we perform cross-validation. And we’re actually going to be performing cross-validation to work out an optimal value for a hyperparameter. So what are hyperparameters? Well, hyperparameters are not parameters of the model. They’re parameters of the algorithm that generates the model. And examples that we have, are– just so far– are, for example, the order of a polynomial regression model, or the number of hidden nodes in a neural network. Since we’ve worked both with polynomial regression and with neural networks in these examples, we’ve actually been using hyperparameters. The question then is, obviously, how can we work out a good value for a hyperparameter?
How can we work out what order polynomial regression models should be, or how many hidden nodes a neural network should have. Well, the answer is that it all just comes down– well, the simplest way to do it is to simply build models with different values of the hyper parameter, and then perform model selection on the resulting models. So we’re going to do this. In this example, we’ll do it with polynomial regression. We’ll build a series of polynomial regression models of different orders. And then we will evaluate the performance of the models generated with different order polynomial regression on validation data to see which one performed best. OK, so this example is in cross-validation EX1.
Like always, we prepare the data.
We’ve just got a synthetic y versus x. We’re going to do a two-way split. Now this will give us training and test data. But of course, in this case, because we’re doing cross-validation, the training data will actually participate in the validation itself. How is that going to work? Well, we remember cross-validation essentially uses the training data and splits it up into subsets and proceeds to build a series of models using every subset bar one, and evaluating the performance of the model built on those particular subsets on the remaining hold out subset. And it would do this holding out each possible subset. Of course, you’ve gone over that in the article. So we don’t need to labour the point.
Let’s see how this works in this case. What we’re going to do is we’re going to build polynomial regression models of order three, four, five, and six. And we’ll use tenfold cross-validation to evaluate their performance. I’ve built this function here, this sapply function that will do that.
What it’s going to do is it’s going to split the training data up into 10 different subsets. And inside the loop or the implicit loop in the sapply function, it will proceed to build models for each set of subsets by a particular one. And based on those models, see how they perform in evaluating the holdout subset. You guys, of course, can attempt to replicate this to get some practise.
The result, once again, this is total squared error rather than mean squared error. This time it’s total squared error. And here we see the total squared error results for the four different models. Third order, up over 60 million. Fourth order, little over 50 million– 50.18 million. Fourth order, 50.20– sorry, fifth order, 50.20. And sixth order, 50.20. So the best model, the best ordered polynomial regression model, was clearly fourth order.
We wouldn’t normally actually look at the results. We’d automate it, allow the computer to examine the total squared error results and find the best model, which we do here. And now we’re going to create a model of that order using the whole training set and see how that performs on the test data. What I’m going to want to do is compare how it performs on the test data to how it performed on the validation data. So I’m just going to change the total squared error into a mean squared error.
And then also generate the mean squared error on the test data.
Now that it’s done, now that we have both the model’s performance on the test data and on the cross validated data, it would be possible to perform a statistical test to make sure that the model is not doing significantly worse on the test data than it was on the validation data. And in doing this, if we can confirm that that’s not happening, then this can allow us to be confident that we weren’t merely selecting that model based on luck in how well it performed on that particular validation data. Any rate, let’s see how our chosen model performs.
And I’ve actually also here generated, worked out the standard deviation of the residuals on the test data. And I’m going to use that to create some confidence intervals.
And there we are. Confidence intervals, exactly like we talked about in the article. I’m just doing the prediction, the regression curve, plus or minus two standard deviations, where the standard deviations were generated by the test data. Standard deviation of the residuals were found from the test data. So the black line is our regression curve. The red lines are our confidence intervals at two standard deviations.