## Want to keep learning?

This content is taken from the Royal Holloway, University of London 's online course, Survival Statistics: Secrets for Demystifying Numbers. Join the course to learn more.
2.19

## Royal Holloway, University of London

Skip to 0 minutes and 9 secondsBut for example, what can we do about this issue that we can't do constituency by constituency polling in the UK? That seems like a pretty serious problem. Yeah, because that's the outcome that people want to know. People don't want to know what share of the vote will this party get. They want to know how many seats will they get. Will they get a majority or not? So one of the ways that people are trying to get around this is something called multi-level regression and post stratification, and that's a way of taking a large national sample with information not just on how people said they'd vote but also on some of their demographic characteristics.

Skip to 0 minutes and 56 secondsSo it might be their age, their gender, let's say their highest level of education, whether they own their accommodation or whether they rent it. Those might be for good demographic characteristics, and with those characteristics we can make a pretty good model-based prediction of how they might vote. So if we take the most recent UK election, young renters they were really likely to vote for the Labour Party. They thought the Labour Party was going to give them a good deal. And so whenever we see a young renter in the data the probability that they have a voting Labour might be quite high. So that stage that's our multi-level regression stage.

Skip to 1 minute and 45 secondsIt's basically about mapping from different voter types to probabilities of some voting behaviour. The next stage that the post-stratification stage relies a lot on things like the census. So in the UK at least, the census reports lots of information about individual constituencies, the proportions of people that own or rent, the age pyramid for that constituency. So we can get a pretty good idea of just how many 18 to 25 female renters there are-- say it's 3% of our constituency. We can make a prediction for them, and we can just repeat that for all of the different voter types and add up the numbers.

Skip to 2 minutes and 42 secondsSo we have a model which tells us how different types of people will vote and we have the census which tells us how many people of each type that are in each area. And if we combine the two, we can get an estimate for how that area will vote. So basically what I hear you saying is that you can do a national survey and the national survey turns up various pieces of information. For example, young renters break heavily for Labour. You can then go and say, OK, so here is a particular catchment area, but you here you've got an election for an MP in this constituency and that constituency happens to be loaded with young renters.

Skip to 3 minutes and 34 secondsSo that is-- it's just one factor, but that's a factor that makes me think this one is probably going to break Labour. Yeah, and if I explain that to you in a different way if I give you an example of university towns, lots of young renters there areas that typically break Labour. So that's just a different description. I didn't-- I just told you that fact. I didn't introduce anything about the census or different voter types. So if I give you that description in a different way, you accept it. But because it's got multilevel regression and post-stratification and it's got all this machinery behind that it's a little bit more involved. Right.

Skip to 4 minutes and 14 secondsSo yeah, so leaving aside all that terminology I think that the main power of this is that you can make credible constituency by constituency predictions without doing constituency by constituency polling. And the key that, kind of, unlocks that potential is that you've got information on the demographic makeup or the characteristics of each constituency. And then you're making assumptions like a young renter in Manchester is probably going to behave similarly to a young renter in Surrey. Yeah, that's right. So some of the patterns are there in the data. So young renters generally might demonstrate one particular kind of behaviour, but we do have to make some assumptions about those effects being constant across all areas.

Skip to 5 minutes and 21 secondsWe can't allow each voter type to behave differently in each area. That would-- that would be a, kind of, anything goes model. So we have to impose some restrictions on how the model works, but they're fairly innocuous ones. So it's a kind of technique that seems to work really well. In the last UK general election, YouGov had a model of this kind. And people looked at it the first day it came out and they laughed at it, because they thought this result didn't seem anything like how expected. I expect a big conservative majority, and you guys are telling me that the conservatives will fall short.

Skip to 6 minutes and 7 secondsThey got laughed at, and in the end the team behind that model they were the ones who were laughing in the end because their model was very, very accurate. And it did pick up on just some of those patterns of younger areas switching to Labour in a way that surprised some people-- some areas which had only ever voted conservative in the past switching just because of that age effect. So thank you very much, Chris. It's been a great interview, and I'm sure the students will enjoy it very much.

# On the Cutting Edge

The methods that Chris and I discuss in this clip are another promising direction for the survey and polling industry.

They constitute a serious response to at least two of the main pitfalls for election prediction that we have encountered this week. They enable pollsters to get credible constituency-by-constituency estimates even:

• without serious constituency-by-constituency polling, which is prohibitively expense to collect
• when most people refuse to engage with their surveys

There is one key assumption and one key data requirement that enable the use of these methods. First, we need to be able to classify people into groups that display reliable within-group voting patterns. These groups may be pretty complicated amalgams of a wide array of characteristics. For example, one such group might be university educated black females in their thirties living alone in the northwest and working in the tourism sector. We don’t have to assume that all the people in such a group have identical preferences. But we do need to assume that people with these characteristics display stable tendencies that do not depend importantly on some other unmeasured characteristics. Second, we need to have reasonably accurate constituency-by-constituency information on the numbers of voters possessing each bundle of characteristics. A good census can provide such information.

With these two ingredients in place we can construct constituency-by-constituency results by multiplying the tendencies of each voter type by their numbers within the constituency and adding these products up. I’ll take you through a sample calculation along these lines in step 2.20.

## Discussion

No method is a panacea. They all have weaknesses.

So what do you think are the weaknesses of this particular method?