Skip to 0 minutes and 0 secondsIoannidis, he's probably the most famous scholar doing this kind of work right now. He's on TV, he writes op-eds, he basically has been writing about these issues for a long time, and he's having a lot of impact in medical research, and now beyond. So this is the most famous of his pieces. He isn't shy.

Skip to 0 minutes and 21 secondsHe boldly states the problem: "Most research findings are false, for most research designs, and for most fields." So he's basically saying the median paper you read is probably a false-positive. That's his sort of take on the world and he actually has put together a lot of evidence suggesting we should take that claim seriously. I don't know if that's true or not, that claim, but even if there's only some chance it's true, it's very worrying. So we have to take it seriously. What he does in this piece is, he doesn't really provide evidence for it in this piece. This piece is really a thought experiment.

Skip to 0 minutes and 58 secondsThis piece just lays out some of the basic probability theory or sort of statistical theory and sort of walks us through why we should be concerned about this problem. Let's just go through some of these terms. So "R" is the ratio of true findings to not-true findings in an area. What does that mean? Well, first of all, the caveat here, he really sees the world– and it's simple to see the world in this way – as a zero one world. There's a test, it could be true, it could be false. Maybe this is like coming from the medical world like a drug helps a patient or it doesn't help a patient. Maybe that's why he's thinking of it that way.

Skip to 1 minute and 39 secondsOf course in reality the world is very continuous. We're thinking of famous empirical literatures in economics like the literature on the returns to education. But let's just stay in Ioannidis' world and just think of this simple binary world. This has an effect, it doesn't have an effect, it's simple to think of it that way. But what does this ratio mean? This ratio is – imagine there's a certain realm of activity. So development economics, let's say we're interested in interventions to improve education in developing countries. How many of those interventions will have a positive impact on learning? For how many of them will there be an effect? That's sort of what "R" is going to capture.

Skip to 2 minutes and 25 secondsMaybe if you tested 20 of these, maybe it's 10 and 10, half of them work, half of them don't in some meaningful way. So in that case, "R" would be 1. But I think Ioannidis' point is that probably in a lot of literature R is a lot less than 1. If we're really in a world where R is greater than 1, like we're really, really confident that 70, 80, 90, 95 percent of the hypotheses in an area are true, and we're really confident of it– and that's true, they really are going to have an impact – then there isn't as much learning to be done. We're in a field where we kind of already know most stuff.

Skip to 3 minutes and 3 secondsMost of the hypotheses are already well understood. So he's really thinking of a world where R is less than or equal to 1 throughout this exercise, like it's 50/50. We have a hypothesis and there's sort of even odds whether or not it's true. So that's kind of a reasonable starting point. So he's going to start it like R equals 1 and sort of go down. And his claim– and again I think he's influenced by the medical field – is there are lots of research areas where R is much less than one.

Skip to 3 minutes and 30 secondsSo in the medical research field, recently where there's been massive mining of genomic data and just tons of really you could say undisciplined exploratory analysis about what the correlations are between certain genes and certain diseases, where there are like thousands and thousands of possible gene sites and hundreds of diseases, people are just running tons of tests. And Ioannidis would say that R in that sort of literature would be really small, like 1 over 100 or something like that. Like needle in a haystack research. So again, there's a range here where some well-defined literatures with well-defined theory where you have some sense of the setting, maybe R is close to 1, and in others maybe R is .1 or .05.

Skip to 4 minutes and 18 secondsSo that's kind of the range you should be thinking of probably for most social science research. And then if you just do the algebra, in terms of converting these little r1 and r0 into R, the sort of fraction of true relationships overall is big R over R + 1. So that's going to be a really key term for us. That's like the plausibility of the hypothesis, basically, based on what we know. The other thing that's going to be very important in thinking about how to interpret a research result is the probabilities of different types of errors. So there are Type I errors that we care about, false-positives. I think everybody is very familiar with Type I errors.

Skip to 5 minutes and 6 secondsThey're the basis of our significance tests and usually we're willing to tolerate a certain amount of false-positives,

Skip to 5 minutes and 16 secondstypically 5 percent. So this is the P = .05 that everybody is familiar with, that's alpha here. There's also the probability of a type II error. So not a false-positive but you could think of it as a false negative. There is a relationship but you don't find it. So there is a significant affect here but maybe your sample size is too small, maybe there's a problem with the design, whatever, and you don't detect... The test suggests there's no effect, but there really is an effect. So 1 minus this false negative rate, 1 minus beta, is our statistical power term.

Skip to 5 minutes and 53 secondsIn other words, how likely are we to find an effect if there really is an effect based on our research design? And in general, large sample sizes will help with statistical power,

Skip to 6 minutes and 7 secondsjust because there's less sampling variation that comes into play. Basically if you have a small sample, there's lots of sampling variation and you can't say very much. If you have a sample of 4, God knows what the data is telling you. If you have a sample of 4,000 then maybe some of the noise evens out. The conventional wisdom, for those of you who have or will design field experiments or lab experiments, the ideal, if you're putting together a proposal for the US National Institutes of Health, they want to see power calculations where you claim you are powered to 80 percent.

Skip to 6 minutes and 40 secondsThey want to basically see what sort of effect size you could legitimately estimate or you'd expect to be able to detect with 80 percent power. That's like the rule of thumb power. Now, in reality when people have checked the amount of power in actual designs, often empirical studies have much less than 80 percent power. Even funded experiments. Probably a pretty decent study has 50 or 60 percent power. There are plenty of studies, small social psychology or experimental economics lab studies that are like really small sample, way underpowered, where maybe they have 20 or 30 percent power.

Skip to 7 minutes and 22 secondsSo probably the reasonable range in most social science research, I don't know, making it up a little bit, is like 20 to 80 percent power. 20 is way too low, maybe 50 or 60 is pretty typical for experiments, and 80 would be like you're pretty well-powered. If you're more than 80 percent power, that's a well-powered design.

Skip to 7 minutes and 42 secondsSo you're familiar with this, and then just for the thought experiment here in the Ioannidis paper, imagine there had been "c" findings in a literature. There are like "c" studies in this research area. He's not combining in any serious way across the "c" studies. He's not thinking about the meta analysis here. It's like, "If I see a result that's out there, what should I take from that result?" And he's going to abstract away from a lot of additional problems, like file drawer problems and whatnot.

Why most published research findings are false

In 2005, John Ioannidis, well known for his research on the validity of studies in the health and medical sciences, wrote an essay titled “Why Most Published Research Findings are False.” This and the next two videos explore what Dr. Ioannidis’s findings and his suggestions for ways forward.

In this video, we are introduced to different types of errors, their probabilities, and the concept of statistical power. We will also learn about Positive Predictive Value, or the believability of a study’s findings, as well as how biases can impact results. The last part of the video lays out six corollaries that characterize scientific research and what scientists can do to improve its validity.

Share this video:

This video is from the free online course:

Transparent and Open Social Science Research

University of California, Berkeley