Skip to 0 minutes and 7 seconds One of the most common assumptions made in shape modelling is that the shape variations follow a normal distribution. In this video, we will discuss the concept of the multivariate normal distribution, discuss its properties, and the main results that we will need in this course.

Skip to 0 minutes and 28 seconds To indicate that a random variable x follows a normal distribution, we write that x is distributed according to a normal distribution with a parameter mu and a parameter sigma squared. Sometimes we want to stress explicitly that we refer to the density function, then we write p of x equals N mu, sigma squared. The parameter mu determines the location and is referred to as the mean of the distribution. Sigma squared, in turn, is the variance. It determines how spread out the values are around the mean.

Skip to 1 minute and 14 seconds On this slide, we see plots of the density function for different values of the parameters mu and sigma squared. In the top row, we see that changing the mean from 0 to 2 simply results in a shift of the distribution. In the bottom row, we see how changing the variance actually influences the shape of the distribution. If the variance is chosen small, then the distribution is more peaked. If we choose it large, then it’s more flat. Independently of how we choose the parameters of this distribution, we always have the following three properties. First, the distribution is unimodal. That means it has exactly one maximum. It is also symmetric and centred around the mean.

Skip to 2 minutes and 8 seconds And finally, we have that values that are far away from the mean, they very quickly become unlikely. In shape modelling, we are usually not interested in modelling a single random variable. Rather we always have a set of random variables that we want to model together. To indicate that a set of random variables, x1 to xn follow a joint normal distribution, we arrange them into a random vector with the components x1 to xn and refer to this vector also simply as x. And we say x follows a normal distribution with mean vector mu and covariance matrix sigma. The mean vector, as before, just determines the location of the distribution. The covariance matrix, in turn, determines its shape.

Skip to 3 minutes and 5 seconds There is one additional requirement that we have here on the covariance matrix, namely, the covariance matrix needs to be symmetric and positive definite.

Skip to 3 minutes and 17 seconds The density function, which we also just denote by p of x, or sometimes if we want to be very explicit, by p of x1 to xn, is given by the formula here. We can see that it consists of two factors. We have, on one hand, the normalisation factor here, which simply ensures that the density integrates to 1. The second factor here, inside this exponent, is essentially the distance of a point x to its mean. But rather than being the standard Euclidean distance, we have here a distance that takes the shape of the distribution into account, as modelled here by the covariance matrix sigma. This distance is often referred to as the Mahalanobis distance.

Skip to 4 minutes and 14 seconds It is, in general, difficult to visualise a multivariate normal distribution. We can, however, do that if we restrict ourselves to only two random variables, say, x1 and x2. We refer to this case as the bivariate normal distribution.

Skip to 4 minutes and 33 seconds To define a bivariate normal distribution, we first need to define the mean of x1 and x2. We do this by specifying the respective component in the mean vector. From the plots here, we see that the mean also simply determines the location of the distribution.

Skip to 4 minutes and 56 seconds Further, we need to define the variance of x1 and x2. This is done by specifying the entries on the diagonal of the covariance matrix. As before, sigma 1 and sigma 2 also determine how spread out the values are in each direction. Also, this can be seen from the plots shown on the slide. Finally, we have to specify the covariance. So the covariance ties together the random variable x1 and x2. It determines how much their values can change together. The covariance is closely related to the concept of the correlation, which is simply given by dividing the covariance with sigma 1 times sigma 2. We say if the correlation is 0, then the two random variables don’t influence each other.

Skip to 5 minutes and 52 seconds They can change their values completely independently. However, if we have a positive correlation between the two random variables, then their respective values influence each other. If we have a large value of x1, we’re likely also to observe a large value for x2. If the correlation is negative, then we have that a large positive value of x1 usually coincides with a large negative value of x2 and vice versa. Sometimes we are given a full joint distribution, but we’re only interested in a subset of the variables. In our example, where we have a bivariate normal distribution of x1 and x2, we might, for example, ask the question, what is the distribution of x1 alone?

Skip to 6 minutes and 44 seconds And we can ask that question in two different cases. One is, what is the distribution of x1 alone if we don’t know anything about the value of x2? The second one is, what is the distribution of x1 if we have actually observed a specific value of x2? This first concept is referred to as the marginal distribution p of x1. The second question leads to the conditional distribution, which we write p of x1 given that x2 equals some value x tilde 2. It turns out that for normal distributions, these two questions have a really simple and satisfying answer. We will present this result directly in the general case of a multivariate normal distribution.

Skip to 7 minutes and 36 seconds So let’s say we have an entire set of variables, x1 to xn and y1 to yn, which are all jointly normally distributed. We can directly see that we can partition the mean vector in entries involving only x and those involving only y. And similarly, the covariance matrix can be partitioned into blocks. Some blocks involve only the variables x. Some blocks involve only the variables y. And then the off-diagonal blocks, they involve values of x and y.

Skip to 8 minutes and 16 seconds This leads to this simplified notation that we can simply summarise together the blocks. So the entries in the mean vector corresponding to x we simply refer to as mu of x and so on. With this notation, we can now actually state the two important results. It turns out that both the marginal distribution and the conditional distribution are, again, normal distributions. The marginal distribution has a particularly simple form. It is simply given by taking the blocks out of the mean vector that involve only x and the same for the covariance. What this means is we completely ignore all the information about y to compute our marginal distribution. The conditional distribution, which we see here, looks a little bit more complicated.

Skip to 9 minutes and 13 seconds These formulas here involve quite many terms. But if we look closely, then all these terms also just correspond to the individual blocks, and we simply do linear algebra operations, so matrix vector multiplication, and computing once the inverse. So all these computations can be done really, really easily. To summarise, we can define a multivariate normal distribution by defining a mean vector and a covariance matrix. The multivariate normal distribution is a very flexible distribution. We can specify n parameters to determine its location and n times n plus 1 divided by 2 parameters to determine its shape. On the other hand, it is also a quite restricted distribution because independently of how we choose these parameters, the distributions remains unimodal and symmetric.

Skip to 10 minutes and 14 seconds Finally, we have shown that the marginal and conditional distributions are, again, normal distributions. We will use these properties and results extensively in the rest of this course.

# Multivariate normal distribution

In this course we will make the assumption that shape variations can be modelled using a normal distribution.

In this video we will review the basics of multivariate normal distributions and discuss the most important properties and results that we will need in this course.

© University of Basel