Skip to 0 minutes and 14 secondsI'm here now with Kristian Lum of the Human Rights Data Analysis Group-- HRDAG. And Kristian is a professional statistician. So we want to take advantage of that, and talk about the statistics of armed conflict. What do you think the statistical approach, the estimation approach, has to offer here, and how does that fit with documentation, and where might one contribute and the other less so? Yes. So I guess, first of all, I think just the documentation process itself-- getting accurate facts of each case is extremely important from the point of view of establishing the facts for each individual incident. But beyond that, we probably want to know something about patterns across time and space.

Skip to 1 minute and 5 secondsWe want to know if killings increase during this time or decrease during this time. Killings of, say, this type of person were higher than killing to this type of person to help establish targeting of different ethnic groups, maybe targeting by region, things like that. And statistics can help us with all of that. It helps us to understand patterns over time and space, understand all these different types of comparisons we might want to make to compare policies. Did this policy really lead to a reduction in the number of killings overall or maybe of this certain group or whatever.

Skip to 1 minute and 38 secondsSo I think that's sort of the role of statistics in this case is to both help us establish patterns, and also, I think-- you said this as well-- to estimate the number of people who were killed within each group beyond what we're able to document individually, one by one, or even incident by incident, to help us to estimate the number of people who were not documented on any of the various efforts at documentation. So we talked-- in the estimation part of the course, we talked mostly about household surveys.

Skip to 2 minutes and 13 secondsAnd then, I have one short lecture in there on the capture-recapture method, or as you may prefer to call it, multiple systems estimation. I gave my own brief explanation, which was really just about the idea of it. But perhaps you would like to just start by giving your version of how this works and what you think it's useful for. Sure.

Skip to 2 minutes and 44 secondsSo household surveys aside, where you have gone to the trouble of coming up with a sampling frame and made sure that the sample that you're drawing is representative, that aside, most of the times, when we are getting data to make these sorts of estimates, we haven't had the benefit of that sort of background of trying to get a representative sample. Usually what's happened is different organisations have gone out and just tried to collect pretty much as many-- have tried to document pretty much as many cases as they possibly can without some sort of principle guiding, making sure they equally document all different cases that have happened.

Skip to 3 minutes and 17 secondsAnd by saying that, it's certainly not a criticism of the groups that are documenting the cases. It's extremely hard work. And I think their goal is not necessarily to get a representative sample, it's just to document as much as they can. And so with that mandate, I think they all do a fantastic job of doing it. But the problem with that data-- which is what we often call a convenience sample-- is that there is no guarantee that it's representative necessarily. Right? Maybe church groups are documenting killings. And in that scenario, you can imagine people maybe who attended that church would be more likely to appear if they had been killed on the list that's being documented by the church.

Skip to 3 minutes and 54 secondsAnd people who didn't attend that church, right, or people who are of that religion would be more likely to appear on that list than people who are not. And so if you were to draw inferences directly from that list-- as you said, take it at face value or sort of treat it as representative-- you might, for example-- in this sort of contrived example-- end up inferring that, say, people who are of that religion or do attend that church were killed at a much higher rate than people who are not. But that actually might not be the case. It might just be the case that they are more likely to report to that particular list.

Skip to 4 minutes and 22 secondsThose are the ones that the people compiling the list were looking for. Yes. And documenting. And so it gets very hard to disentangle the sorts of things we want to understand. Was there targeting of this specific group? Right? It's hard to disentangle. Were they more likely to be reported or were they more likely to be killed? And so when we have multiple different organisations doing this sort of documentation, we can use capture-recapture or multiple systems estimation-- it's a statistical methodology that goes by multiple names, depending on the field it comes from or sort of your background in dealing with it-- to estimate the number of people who appeared on none the lists.

Skip to 4 minutes and 56 secondsSo for example, say we have one church group documenting; we have one NGO documenting; we have one government list documenting. How this works is you take all of those different lists and then you calculate the overlaps among those lists. So you say 25 people were documented by all three of those groups. 10 people were documented by the first two but not a third one. Something like that. So you end up coming up with all of these various overlaps, and then using statistical modelling-- it's a little bit more principled than how I'm describing it now-- but you could estimate the number of people who appeared on none of those lists.

Skip to 5 minutes and 33 secondsAnd sort of the intuition behind this is that if there is a lot of-- in general-- this is obviously sort of a cartoon version of this. It is a principled, statistical methodology. But if there are a lot of overlaps, then you would expect that the lists combined have gotten most of the killings. Whereas, if there is not that much overlap among the lists, well, there's probably a lot more out there to be documented that none of them have gotten. And so by stratifyings-- that means dividing up into different groups-- we can get estimates of, say, the number of people killed, make an estimate for group A. Make a separate estimate for group B.

Skip to 6 minutes and 3 secondsThen, we can make comparisons of the number of people killed from group A versus Group B, even in cases where, say, group A is much more likely to be documented than group B. And so we sort of can model our way around, in some cases, that problem that I was talking about where you can't disentangle was group A more likely to be killed or just more likely to be reported. OK. Thank you very much. Oh, yeah. Sure. It was my pleasure. That was a good interview. Thanks.

Kristian Lum on capture-recapture

In this clip I interview Kristian Lum, lead statistician at the Human Rights Data Analysis Group.

I underline here what I view as Kristian’s main point in this interview. She believes that the primary contribution the field of statistics can make to the field of war-death accounting is to help us understand violence patterns. These can be patterns over time, space, characteristics of perpetrators and victims or other features we might be interested in. She argues that the true contours of these patterns may get distorted in even high-quality casualty recording projects. However, she continues, capture-recapture estimation can reduce such distortions.

So Kristian does not view the estimation of the total number of deaths as the most important thing. She is much more interested in understanding patterns.

Share this video:

This video is from the free online course:

Accounting for Death in War: Separating Fact from Fiction

Royal Holloway, University of London