Want to keep learning?

This content is taken from the The Open University & Persontyle's online course, Advanced Machine Learning. Join the course to learn more.

Skip to 0 minutes and 1 second OK. So here we are going to be looking at Gaussian processes in the GP space EX1 dot R file. Once again, to allow interested students to have a look at the implementation of mathematics– get a grip of the mathematics, we’ve implemented a Gaussian process class. That’s at the bottom of the GP space EX one dot R file. To use it, you’ll need to source the file. Interested students can go through it and have a look at how mathematics is implemented. Otherwise, you can use it as any third-party implementation. There are three important functions– the GP function, which will create the Gaussian process model object.

Skip to 0 minutes and 49 seconds The Predict function, which can be used to estimate new target variable values for new features that will of course output not just an expected value of mean, but also variants for the output for the new features. And there’s a third function, Generate, which will sample a function from the Gaussian process model. OK. So let’s have a look at this example in the Gaussian process example function. As always, we’re going to start off by setting up our data and loading the required packages. This is just a synthetic data set, y versus x. Now we’re going to split our data into Training and Test.

Skip to 1 minute and 41 seconds And then we’re going to build a Gaussian process model from the Training data. Now to build a Gaussian Process model, we need a kernel, and we also need an L2 regularisation penalty. So here is the kernel. We’ll use a simple laplacian kernel. And we’ll specify an arbitrary lend value, 0.5. Now of course in reality, we’d need to spend quite a bit of time working out the L2 regularisation penalty we want to use for this problem, and also the kernel– or at least the kernel parameters –that are optimal for our problem. But here, we’re just going to take an arbitrary kernel, arbitrary lender. And then we can create the Gaussian process model.

Skip to 2 minutes and 31 seconds So you see that the function is set up– the constructor function is set up so that you parse the formula. You want to estimate y based on x. You’ll find these variables in the Training data set. Here’s the kernel I’m going to use. Here’s my L2 regularisation penalty. So we create the Gaussian process model. And we will calculate the mean squared area of the model for the test data. We’ll be calculating the mean squared area based on the mean values of the predictions. Let’s say the Gaussian process for any new set of features– a new feature vector –will produce a probability distribution, but we’re going to be calculating the mean squared area based on the mean of that outputted distribution.

Skip to 3 minutes and 22 seconds We’re going to use the means to produce our regression curve.

Skip to 3 minutes and 28 seconds We could also calculate the log probability. Let’s say the log likelihood of the model given the test data. And that’s useful because you may well want to evaluate your statistical model, your Gaussian process, based on the log likelihood of the model given the test data rather than, say, the mean squared area.

Skip to 3 minutes and 53 seconds Anyway, we can see the result of these two evaluation scores. The mean squared area of the Gaussian process model in the test data was 1.4. The likelihood of the GP model in the test data was negative 8.4. Now of course as I said, if this was a real problem, we’d spend quite a lot of time optimising the kernel, the kernel parameters, and the L2 regularisation parameter based on our data. But for now, we’ll just continue with what we had. What we will do is create an alternative model for comparison. We’ll just create a nice simple ordinary least-squares model.

Skip to 4 minutes and 39 seconds And we’ll see how well it does on the test data as well, both in terms of mean squared area, and in terms of log likelihood.

Skip to 4 minutes and 52 seconds When we’re calculating the log likelihood, of course, we’re going to be making use of an error distribution based on the residuals of the model on the train there. There we go. So the mean squared area of the ordinary list squares model on the test data was 12.5. That’s compared to 1.4 for the Gaussian process. So the Gaussian process is a lot better. The likelihood of the OLS model– or actually the log likelihood of the OLS model– The test data was negative 16.9, the log likelihood of the Gaussian process model on the test data was negative 8.4.

Skip to 5 minutes and 35 seconds And of course higher is better here, so the Gaussian process model appears to be the best of the two, based not only on the mean squared here, but also on the log likelihood. Now we’ll plot the models. First of all, we’ll plot the Gaussian process model.

Skip to 5 minutes and 55 seconds And here we see that the blue curve is the regression curve, or the mean values of the Gaussian process for each x value in this range. And the red curves are confidence intervals, plus or minus 2 standard deviations.

Skip to 6 minutes and 17 seconds For comparison, we’ll do the same thing with the ordinary least-squares model. Incidentally, the black points are the training data. The red points are the test data. So here we have the Gaussian process model. Here we have the ordinary least-squares model. We’ll do a third plot where– it will be just like the first plot, where we plotted the Gaussian process model –but we’ll also generate three functions from the Gaussian process model, and place them on the plot.

Skip to 6 minutes and 55 seconds It takes some time to generate these functions. Why don’t we zoom in?

Skip to 7 minutes and 3 seconds Here’s our first generated function in black, second in grey, third in green. So you see how we can actually sample functions from a Gaussian process, which can be very useful in certain Gaussian statistics applications.

Skip to 7 minutes and 26 seconds So there we go. If you’re interested, try to replicate what we’ve done here. If you’re interested in the implementation of the mathematics, have a look through the Gaussian processes class. And you’ll find the Gaussian processes can be very useful tools in a wide range of applications.

Gaussian Processes Exercise

A video exercise for Gaussian processes. The associated code is in the GP Ex1.R file. Interested students are encouraged to replicate what we go through in the video themselves in R, but note that this is an optional activity intended for those who want practical experience in R and machine learning.

In this exercise we train a Gaussian process on labelled training data and evaluate its performance using MSE and log-likelihood on test data. We compare its performance with that of a basic OLS model. We also see how we can sample functions from a Gaussian process.

We have implemented a manual implementation of Gaussian processes. Interested students are able to examine this to get a look at the mathematics, and the implementation of the mathematics, of this technique. Uninterested students can simply use this code as any third party library. In any case, this code will need to be sourced before it can be used.

Note that the stats R package is used in this exercise. You will need to have it installed on your system. You can install packages using the install.packages function in R.

Please note that the audio quality on this video is of lower quality that other videos in this course.

Share this video:

This video is from the free online course:

Advanced Machine Learning

The Open University