OK. So here we are going to be looking at Gaussian processes in the GP space EX1 dot R file. Once again, to allow interested students to have a look at the implementation of mathematics– get a grip of the mathematics, we’ve implemented a Gaussian process class. That’s at the bottom of the GP space EX one dot R file. To use it, you’ll need to source the file. Interested students can go through it and have a look at how mathematics is implemented. Otherwise, you can use it as any third-party implementation. There are three important functions– the GP function, which will create the Gaussian process model object.
The Predict function, which can be used to estimate new target variable values for new features that will of course output not just an expected value of mean, but also variants for the output for the new features. And there’s a third function, Generate, which will sample a function from the Gaussian process model. OK. So let’s have a look at this example in the Gaussian process example function. As always, we’re going to start off by setting up our data and loading the required packages. This is just a synthetic data set, y versus x. Now we’re going to split our data into Training and Test.
And then we’re going to build a Gaussian process model from the Training data. Now to build a Gaussian Process model, we need a kernel, and we also need an L2 regularisation penalty. So here is the kernel. We’ll use a simple laplacian kernel. And we’ll specify an arbitrary lend value, 0.5. Now of course in reality, we’d need to spend quite a bit of time working out the L2 regularisation penalty we want to use for this problem, and also the kernel– or at least the kernel parameters –that are optimal for our problem. But here, we’re just going to take an arbitrary kernel, arbitrary lender. And then we can create the Gaussian process model.
So you see that the function is set up– the constructor function is set up so that you parse the formula. You want to estimate y based on x. You’ll find these variables in the Training data set. Here’s the kernel I’m going to use. Here’s my L2 regularisation penalty. So we create the Gaussian process model. And we will calculate the mean squared area of the model for the test data. We’ll be calculating the mean squared area based on the mean values of the predictions. Let’s say the Gaussian process for any new set of features– a new feature vector –will produce a probability distribution, but we’re going to be calculating the mean squared area based on the mean of that outputted distribution.
We’re going to use the means to produce our regression curve.
We could also calculate the log probability. Let’s say the log likelihood of the model given the test data. And that’s useful because you may well want to evaluate your statistical model, your Gaussian process, based on the log likelihood of the model given the test data rather than, say, the mean squared area.
Anyway, we can see the result of these two evaluation scores. The mean squared area of the Gaussian process model in the test data was 1.4. The likelihood of the GP model in the test data was negative 8.4. Now of course as I said, if this was a real problem, we’d spend quite a lot of time optimising the kernel, the kernel parameters, and the L2 regularisation parameter based on our data. But for now, we’ll just continue with what we had. What we will do is create an alternative model for comparison. We’ll just create a nice simple ordinary least-squares model.
And we’ll see how well it does on the test data as well, both in terms of mean squared area, and in terms of log likelihood.
When we’re calculating the log likelihood, of course, we’re going to be making use of an error distribution based on the residuals of the model on the train there. There we go. So the mean squared area of the ordinary list squares model on the test data was 12.5. That’s compared to 1.4 for the Gaussian process. So the Gaussian process is a lot better. The likelihood of the OLS model– or actually the log likelihood of the OLS model– The test data was negative 16.9, the log likelihood of the Gaussian process model on the test data was negative 8.4.
And of course higher is better here, so the Gaussian process model appears to be the best of the two, based not only on the mean squared here, but also on the log likelihood. Now we’ll plot the models. First of all, we’ll plot the Gaussian process model.
And here we see that the blue curve is the regression curve, or the mean values of the Gaussian process for each x value in this range. And the red curves are confidence intervals, plus or minus 2 standard deviations.
For comparison, we’ll do the same thing with the ordinary least-squares model. Incidentally, the black points are the training data. The red points are the test data. So here we have the Gaussian process model. Here we have the ordinary least-squares model. We’ll do a third plot where– it will be just like the first plot, where we plotted the Gaussian process model –but we’ll also generate three functions from the Gaussian process model, and place them on the plot.
It takes some time to generate these functions. Why don’t we zoom in?
Here’s our first generated function in black, second in grey, third in green. So you see how we can actually sample functions from a Gaussian process, which can be very useful in certain Gaussian statistics applications.
So there we go. If you’re interested, try to replicate what we’ve done here. If you’re interested in the implementation of the mathematics, have a look through the Gaussian processes class. And you’ll find the Gaussian processes can be very useful tools in a wide range of applications.