4.15

## The Open University

Skip to 0 minutes and 1 secondAll right. So now we're going to look at the second of the two missing data example exercises. This time we're going to be using Markov chain Monte Carlo. And in particular, we're going to be using Metropolis within Gibbs. And Metropolis within Gibbs missing data algorithm that we described in the article.

Skip to 0 minutes and 24 secondsExactly like in the first exercise, we'll use this as an opportunity to look at a few other things. In fact, we're going to look at the same sorts of things that we did in the first missing data example. Because we're going to be generating a weighted data set again, we'll use the linear regression formula to create some linear regression models from this weighted data set. We'll again create an additive and multiplicative one and compare them, and we'll compare them statistically like we did last time except instead of the BIC score, we'll use the AIC score. OK. Let's look at the code. Now this is an important difference this time when we prepare the data.

Skip to 1 minute and 5 secondsIt's the same data as last time except that it's now not only the type variable that has some missing values, it's also the income variable. So we have missing values not only in a discrete or nominal variable, but also in a real valued variable. And that makes the expectation maximisation algorithm less appealing to use, which is why we will use the MCMC alternative. Once again, we'll start off by finding which rows and columns contain missing values.

Skip to 1 minute and 52 secondsAnd then we will use the Metropolis and Gibbs. Now, once again actually, we've created our own manual version that you can find at the bottom of this code. And you can go through it yourself if you're interested in looking at the mathematical implementation or you can just use it as a third party implementation. We need to source the file to be able to use it, so we do that. Now to use a MCMC algorithm, you need to specify both the burn and the samples. The burn is the number of samples that we throw away at the beginning that are contaminated by the initial random failures.

Skip to 2 minutes and 30 secondsAnd then the samples is the number of samples we collect after the burn to make use of in estimating the probability distributions of the variables and interest. In this case, the missing values in our features. Now this is just a simple implementation of Metropolis and Gibbs. In more complicated ones, there will be additional parameters that you may need to specify. We will just need these two. So let's run our manual implementation.

Skip to 3 minutes and 2 secondsAnd here we're generating the samples. We've gone through generating 100 burn samples. And now it generated 200 samples we'll use.

Skip to 3 minutes and 14 secondsHave a quick look at them.

Skip to 3 minutes and 19 secondsWhat we have here are values for the missing items in our features, 200 instances of each.

Skip to 3 minutes and 37 secondsNow, once again, what we're going to want to do is to create a new weighted data set where all the rows of the image, original data set that did not contain a missing item, will be given a weight of 1. And then those 200 samples will replace the rows with missing items and each of those will be given a weight of 1 over 200. Once we've done that, we've got a weighted data set that we can then plug into any supervised learning algorithm that accepts weighted data sets. We will use the linear regression models exactly like last time using an additive and multiplicative version. So two linear regression models, one with an additive formula, one with a multiplicative formula.

Skip to 4 minutes and 37 secondsOnce we've done that, we can work out the log probabilities of the data given the model, which is to say the log likelihood of the model given the data. And we can calculate the AIC scores using the formula from week two in the evaluation of statistical models article. And we can output that to the console, though like last time it got a bit messy. So maybe we could just simply directly output the AIC scores for the two models. AIC 1, AIC two. Smaller is better. So, just like last time, it turns out that the additive model outperforms the multiplicative model in virtue no doubt of its simplicity and the small data set.

Skip to 5 minutes and 32 secondsSo the AIC and the BIC appear to agree on this. Now let's also plot the two resulting models just like we did in the last exercise.

Skip to 5 minutes and 49 secondsLet's zoom in because the graphs are a little small in this resolution. OK. So just like last time, the additive formula, they have different intercepts but the same slope. Multiplicative formula different intercepts and different slopes. Just like last time the additive formula perform better according to our statistical evaluation. And just like last time the colours of the points correspond to blue colour for red, white colour for green and professional is blue.