Skip to 0 minutes and 13 seconds What about multiple systems estimation or capture, recapture? Some of this might apply. So they also have methods for calculating uncertainty intervals, and those might do a good job of capturing some of the uncertainty but perhaps not all of it. Normally not all of it. Normally they are accounting for error that comes from the fact that in multiple systems estimation there’s tacit assumptions about samples who are captured on various casualty lists are some sense a random representation of a certain number. And therefore, there’s variation, and that kind of variation is captured well by multiple systems estimation experts.
Skip to 1 minute and 8 seconds But going back to what we said at the beginning of our conversation about documentation error, or linkage errors, or misclassification, that’s much harder to ripple through and understand how that might really change estimates. It’s clear for the students who are understanding multiple systems estimation that if you don’t de-duplicate records effectively well– so you’ve got the same person on two different lists but don’t recognise there’s the same person– that has a clear directional effect of not only having counting one death as two, but then it gets through multiple systems estimation principles, gets magnified much more because they look like two unique deaths that were only country by one list.
Skip to 2 minutes and 3 seconds I gave the example in the course of trying to figure out how many fish are in a pond and dipping the net and tagging the fish, and dipping again and seeing how many fish on each dip, but also how many tagged fish on the second dip. Let’s say the tags fall off. The tags falling off is actually what I’m saying. That’s not recognising that it’s the same fish. Right. Not recognise it’s the same death on two different lists. And it will lead to, first of all, the basic documentation number being too high.
Skip to 2 minutes and 38 seconds But the irony of multiple systems estimation– that will really make the ultimate– if that happens a lot, not only do you get the basic count of unique deaths wrong, too high, but you also get the estimates way too high there. So you had two dips, and so one thing is just calculating the total number of fish that you’ve discovered. Yea, that you’ve tagged at some point. And that you’ve tagged at some point. And if some tags have fallen off, then you count some fish twice. Correct. And then you end up that that count is too high. But then it’s only too high by exactly the number of tags that have fallen off. Correct.
Skip to 3 minutes and 24 seconds But now when you do the statistical estimation, that effect is going to get magnified because if the failure to identify matched fish is going to make it look like you’re hardly capturing any fish twice or you’re underestimating how many fish are capturing twice. And so it gets leveraged up essentially. Correct. It gets leverage up quite substantially. And that can be very hard to quantify.
Skip to 3 minutes and 54 seconds But that kind of sensitivity analysis is normally now the modern approach to dealing with that is to start postulating possible error rates from information you know about the way you’re identifying– how much information you have to identify specific deaths and therefore how able you are to recognise the same individual twice– to recognise this is the same individual. If Nick Jewell is on one list and then on the second list it’s just NP Jewell is on the list, well is that the same person or not? Well, normally when you’ve got the context then you can postulate reasonable rates of error in these deduplication and then in a sensitivity analysis postulate how that will modify the uncertainty in the MSE.
Skip to 4 minutes and 51 seconds And this is uncertainty beyond the basic statistical uncertainties. It’s promulgating another form of error through your estimates. And that’s a second stage, and it’s hard to do. But it’s important. OK. Thank you very much for joining us. I’m sure the students have benefited a lot from that. And we’ll see you soon.
Nicholas Jewell on capture-recapture
Here we enjoy our last few moments with Nicholas Jewell. Most of the conversation is about one specific point: the leveraging effect of errors in determining overlaps between lists of war dead.
Let’s illustrate Nicholas’ point with an extremely simple example. For ease of exposition I will identify individual war deaths with numbers.
List 1 contains the following deaths - 1, 2 3
List 2 contains the following deaths - 2, 3, 4, 5
The two lists combined contain 5 unique deaths - 1, 2, 3, 4 and 5. 2 deaths appear on both lists - 2 and 3
The capture-recapture estimate for the total number of deaths is
(3 x 4)/2 = 6
Thus, we estimate that there is 1 death not captured on either list.
Now suppose we make a mistake in matching deaths across lists. In particular, we classify death 2 on list 1 as different from death 2 on list 2. This could happen if, for example, we match based on names but the name of death 2 is misspelled on one of the lists. Or maybe there is a coding error that gets the location of death 2 wrong on one of the lists.
This failure to match death 2 across the lists causes two separate problems with our estimate.
Mistake 1 we think that the two lists combined contain 6 unique deaths when, in reality, they only contain 5
Mistake 2 we think that only one death appears on both lists when, in reality, 2 deaths appear on both lists.
Our mistaken capture-recapture estimate is now
(3 x 4)/1 = 12
So our mistaken estimate exceeds the correct estimate by 6. 1 out of these 6 comes from mistake 1 while 5 out of the 6 come from mistake 2. This big effect of mistake 2 is the leveraging effect that Nicholas stresses.
© Royal Holloway, University of London