Skip to 0 minutes and 1 secondOK, well here's the first learning graph example exercise-- learning graph EX1.R. We'll jump right into it. We're working with the Boston Housing data again, so let's prepare that data. Now, to generate the learning graph, we're going to need to create models in a sequence using, say, 10%, 20%, 30%, up to 100% of the training data. Now what we'll do is we'll do OLS models. And we need to get the in sample and out of sample error estimates for each model. The in sample is, of course, easy. It's just the mean square area on the same data that the model was trained with. The out of sample error, we will get by doing 10-fold cross validation.
Skip to 0 minutes and 54 secondsSo here I generate a couple of helpful functions that will assist me in doing exactly this task. Here's one that will get the in sample error, here's one that will get the out of sample error. And both of them take as inputs the data that's going to be used. Then we'll run these functions, get them in the environment so that we can then estimate the in sample and out of sample error for ordinary least squares models created by using 10%, 20%, 30%, et cetera of the dataset. Let's plot that, and see what we have. Here we go.
Skip to 1 minute and 50 secondsAs noted, this is not like an idealised learning graph. We see, for example, that the in sample error goes up and then starts going down surprisingly. And likewise, the out of sample error goes down but then stopped going up occasionally. So there's noise here. But it gives us a very good idea of the amount of improvement that we can expect. We see a lot of improvement when we move up to about 100, then still significant improvement from 100 up to about, say, 350, and then the distance between the two lines stays pretty steady from that point on.
Learning Graphs Exercise 1
The first exercise for learning graphs. The associated code is in the Learning Graphs Ex1.R file. Interested students are encouraged to replicate what we go through in the video themselves in R, but note that this is an optional activity intended for those who want practical experience in R and machine learning.
We work through the process of creating a learning graph for OLS models on Boston housing data. We create a sequence of models from increasing proportions of the training data, and calculate the in-sample and out-of-sample error estimates for these models (using the training data, and cross-validation, respectively). We then plot this graph and discuss how it should be interpreted.
Note that the utils and mlbench R packages are used in this exercise. You will need to have them installed on your system. You can install packages using the install.packages function in R.
© Dr Michael Ashcroft