# Gaussian Process regression

In this video Marcel Lüthi explains the mathematics behind Gaussian Process regression.
6.2
Welcome to week five of shape modelling. In this week, you will learn how you can use shape models to reconstruct a missing part of a shape. The mathematical technique that we use for that is called Gaussian Process regression.
26.2
Gaussian Process regression is an inference method that lets you incorporate known deformations into a Gaussian Process model. Assume we’re given a shape model. So there’s a reference shape, together with a model of shape deformations. The grey area you see here signifies a confidence region. So it tells us where all the likely shape could lie. We see there’s quite a lot of variance, still everywhere. Now assume we’re given the information that the point here, at the pinkie finger, would move here, and the point at the thumb would move to this position.
71.8
Then, using Gaussian Process regression, we can include this information and we obtain a new model. And this model has no variance anymore where the points moved. And also, not here, and in the rest of the shape, we still have variance left. So it’s again a Gaussian Process model, which has the property that all the shapes that we can explain somehow match the observation, or the known deformations, that we put in.
105.7
The main mathematical tool that we need for that is the conditional distribution of a multivariate normal. So assume that you have a set of random variables. I call them here, x1 to xn, and y1 to ym, and they’re jointly normally distributed.
126.9
To simplify notation, we just introduce the notation here. So we introduce capital X for referring to all the variables x1 to xn, and capital Y to refer to the corresponding variables, y1 to ym.
146.5
Now, given this multivariate normal distribution in this form here, and a set of observations which we call tilde alpha, and these are observations for the random variables Y, we can form the conditional distribution. And we could ask, what is the distribution of the random variables X, given this knowledge about Y? Now, using a multivariate normal distribution, we have again this very special property that the conditional distribution is again a multivariate normal distribution. And what is even better, the parameters here, mu bar, and sigma bar, so the new mean and the covariance matrix of this conditional distribution, they’re known in closed form. And we see the formulas down here.
202.2
Now these formulas, they may look a bit messy. But actually, they’re as nice as they could be. Because it only involves matrix multiplication and vector addition. So it’s something that is very simple to work with.
218.2
We will now apply this concept for shape modelling. Assume we’re given a shape model that is a reference shape together with a Gaussian Process model that models the deformations.
233
We call this model the prior model. So because we know that the prior model is a Gaussian Process, we also know how it would look like if we evaluated it at the finite discrete number of points. The corresponding discrete distribution that we get is a multivariate normal distribution.
257.7
We have here again grouped the variable into two parts. So we have a set of variables, u x1 to u xn, and we have a set of variables, u y1 and u ym. And correspondingly, the mean and the covariance matrix has been also divided into these two parts.
283.1
Now, here we come into the kind of abuse of notation. We’re modelling deformation fields, which means that actually these components here, they always consist of two vectors. A vector in x direction, and a component in y direction. And we have just summarised it here in one. The same here for the covariance that would actually be 2 times 2 matrix. But writing that out would just lead to too much clutter in the notation. And anyway, we’re going to simplify this further. So as before, we introduced a simplified notation that we refer to all the variables involving x to u capital X here, and the random variables involving y to u capital Y.
334.8
And now, assume we observe some deformations for the values Y. So we have here, at the thumb, an observation we call u tilde 1, and we have here an observation, u tilde 2.
351.6
So we know that u of Y i in general corresponds to a known deformation, tilde u i. Now we’re in exactly the same setting as we were before, when we had the conditional distribution of a multivariate normal distribution. We can just apply this formula that we’ve seen a few slides before, and we can derive the conditional distribution for this Gaussian Process here. Or rather, this discretised version off the Gaussian Process, which is still a multivariate normal distribution. So we have here this multivariate normal distribution, and we know the mean and it’s covariance matrix in closed form. It’s just composed of these corresponding blocks.
403.5
Let’s look at the solution a bit closer. Now if you look at the mean here, then if we interpret that again as a deformation that this models, we see that here, where we actually have observed the value in the neighbourhood of that point, the mean that we predicted, the mean from the conditional distribution, is very close to this deformation. And also here, we see that the observation is kind of propagated to neighbouring points. So we could infer, also for our new points, where are they likely to align. And one and the most likely solution is given by this mean.
454.1
It is also interesting to look at the covariance. The covariance gives us the confidence, how certain we are about a given deformation. And we observe that where we have our observation, we are extremely certain. When you’re going a bit further away, then we have more and more variation, which is quite natural, because far away from the observed observation, we’re not sure there are many hands that could agree with that observation here, which would move the thumb down here. But some are here. They would be totally different from the mean solution that we observe. And the beauty of this inference mechanism is really that we have all this information.
506
Now what we have done is, we have computed that for a fixed discretisation. But this discretisation, the number of points we chose for it, that was completely arbitrary. We would have had the possibility to take any number of points and compute the same results. Now if you think back how we defined Gaussian Processes, we actually define a Gaussian Process as a process which, whenever we discretise it and observe a finite number of points, then these deformations for a finite number of points follow a joined normal distribution, irrespective of how many points we choose. And here, we are exactly in this situation again. So what we actually have is, we have through this procedure defined a Gaussian Process.
559.5
Let’s look at how the mean function of such a process could look like. So we have here, a term which is constant. It only involves Y. Then also this part here is a constant. It does not depend on any variables X. So then, we have just this part here, which depends on X. And if we look at that closely, then we actually see that we’ve done way too much work. Because that capital X already denotes a set of variables. But we’re only interested in one. So what we can do is, we just replace the set of variables with a single variable. And then here, we even have a nice interpretation of that expression here.
606.9
It’s just a covariance function at the point X, or the covariance compared with all the observations. And this part here is just the mean, evaluated at the point x. And what we have is a nice functional form of the mean. So we can compute that, for any point x, for any observed.
633.5
Exactly the same holds for the covariance function, or the covariance matrix. I simply replace the set of variables capital X by small x, so for a single observation, and I immediately get a formula which gives me the covariance for any pair of two points of the reference. So we have a covariance function. We have now just defined a new Gaussian Process whose mean function and covariance function we know in closed form. This process is called the posterior process. And it defines an entire distribution of a deformation field that match with the given observation.
677.3
This is a nice result. The only little issue we have with it is that the vector fields, they all match the given observations perfectly.
689.8
That we have perfect observation is quite an unrealistic assumption. In practise, whenever we observe a value, we always are a bit uncertain, because there is usually noise on our observation. And what we’ll do now is we add a noise term to our observation. So we assume that we have uncertainty here, and here on our observation. In formulas, this translates to that to this value, we add a certain noise term, epsilon, and we say, these two together explain our observation. The model we assume for the noise term is again a normal model with a mean zero and a variance of sigma squared.
741.5
This is extremely easy now, to add to our Gaussian Process model, because the model is additive, and the noise of course only enters the observation. It will just enter as this term down here. So just to the observation, we add this sigma squared times the identity matrix. And since this remains a multivariate normal distribution, we can repeat the same process that we had before, and actually define a new Gaussian Process. And this method that we have just derived is called Gaussian Process regression.
787.9
Its mean is the solution to a regression problem. And as before, all likely deformations, they agree now with the observations, but they don’t agree perfectly anymore. But they agree to the degree that we can control with this sigma here, our noise assumption. To summarise, we have just defined a new type of shape model, which we call posterior shape models. These posterior shape models give us the possibility to incorporate known deformations, and all the shapes that we sample from it match this known observation. And you will see in the following exercises what important applications this model has.

In many practical applications of shape modelling the goal is to infer the full shape from given partial observations. Gaussian Process regression is an inference technique that can be used to predict the unseen part of a shape from such information.

In this video you will learn the mathematics of Gaussian Process regression, how it can be used to predict the most likely shape from an observed part, and how you can obtain a new shape model which is constrained to match the observed part of the shape.