Skip to 0 minutes and 0 seconds Ioannidis, he’s probably the most famous scholar doing this kind of work right now. He’s on TV, he writes op-eds, he basically has been writing about these issues for a long time, and he’s having a lot of impact in medical research, and now beyond. So this is the most famous of his pieces. He isn’t shy.
Skip to 0 minutes and 21 seconds He boldly states the problem: “Most research findings are false, for most research designs, and for most fields.” So he’s basically saying the median paper you read is probably a false-positive. That’s his sort of take on the world and he actually has put together a lot of evidence suggesting we should take that claim seriously. I don’t know if that’s true or not, that claim, but even if there’s only some chance it’s true, it’s very worrying. So we have to take it seriously. What he does in this piece is, he doesn’t really provide evidence for it in this piece. This piece is really a thought experiment.
Skip to 0 minutes and 58 seconds This piece just lays out some of the basic probability theory or sort of statistical theory and sort of walks us through why we should be concerned about this problem. Let’s just go through some of these terms. So “R” is the ratio of true findings to not-true findings in an area. What does that mean? Well, first of all, the caveat here, he really sees the world– and it’s simple to see the world in this way – as a zero one world. There’s a test, it could be true, it could be false. Maybe this is like coming from the medical world like a drug helps a patient or it doesn’t help a patient. Maybe that’s why he’s thinking of it that way.
Skip to 1 minute and 39 seconds Of course in reality the world is very continuous. We’re thinking of famous empirical literatures in economics like the literature on the returns to education. But let’s just stay in Ioannidis’ world and just think of this simple binary world. This has an effect, it doesn’t have an effect, it’s simple to think of it that way. But what does this ratio mean? This ratio is – imagine there’s a certain realm of activity. So development economics, let’s say we’re interested in interventions to improve education in developing countries. How many of those interventions will have a positive impact on learning? For how many of them will there be an effect? That’s sort of what “R” is going to capture.
Skip to 2 minutes and 25 seconds Maybe if you tested 20 of these, maybe it’s 10 and 10, half of them work, half of them don’t in some meaningful way. So in that case, “R” would be 1. But I think Ioannidis’ point is that probably in a lot of literature R is a lot less than 1. If we’re really in a world where R is greater than 1, like we’re really, really confident that 70, 80, 90, 95 percent of the hypotheses in an area are true, and we’re really confident of it– and that’s true, they really are going to have an impact – then there isn’t as much learning to be done. We’re in a field where we kind of already know most stuff.
Skip to 3 minutes and 3 seconds Most of the hypotheses are already well understood. So he’s really thinking of a world where R is less than or equal to 1 throughout this exercise, like it’s 50/50. We have a hypothesis and there’s sort of even odds whether or not it’s true. So that’s kind of a reasonable starting point. So he’s going to start it like R equals 1 and sort of go down. And his claim– and again I think he’s influenced by the medical field – is there are lots of research areas where R is much less than one.
Skip to 3 minutes and 30 seconds So in the medical research field, recently where there’s been massive mining of genomic data and just tons of really you could say undisciplined exploratory analysis about what the correlations are between certain genes and certain diseases, where there are like thousands and thousands of possible gene sites and hundreds of diseases, people are just running tons of tests. And Ioannidis would say that R in that sort of literature would be really small, like 1 over 100 or something like that. Like needle in a haystack research. So again, there’s a range here where some well-defined literatures with well-defined theory where you have some sense of the setting, maybe R is close to 1, and in others maybe R is .1 or .05.
Skip to 4 minutes and 18 seconds So that’s kind of the range you should be thinking of probably for most social science research. And then if you just do the algebra, in terms of converting these little r1 and r0 into R, the sort of fraction of true relationships overall is big R over R + 1. So that’s going to be a really key term for us. That’s like the plausibility of the hypothesis, basically, based on what we know. The other thing that’s going to be very important in thinking about how to interpret a research result is the probabilities of different types of errors. So there are Type I errors that we care about, false-positives. I think everybody is very familiar with Type I errors.
Skip to 5 minutes and 6 seconds They’re the basis of our significance tests and usually we’re willing to tolerate a certain amount of false-positives,
Skip to 5 minutes and 16 seconds typically 5 percent. So this is the P = .05 that everybody is familiar with, that’s alpha here. There’s also the probability of a type II error. So not a false-positive but you could think of it as a false negative. There is a relationship but you don’t find it. So there is a significant affect here but maybe your sample size is too small, maybe there’s a problem with the design, whatever, and you don’t detect… The test suggests there’s no effect, but there really is an effect. So 1 minus this false negative rate, 1 minus beta, is our statistical power term.
Skip to 5 minutes and 53 seconds In other words, how likely are we to find an effect if there really is an effect based on our research design? And in general, large sample sizes will help with statistical power,
Skip to 6 minutes and 7 seconds just because there’s less sampling variation that comes into play. Basically if you have a small sample, there’s lots of sampling variation and you can’t say very much. If you have a sample of 4, God knows what the data is telling you. If you have a sample of 4,000 then maybe some of the noise evens out. The conventional wisdom, for those of you who have or will design field experiments or lab experiments, the ideal, if you’re putting together a proposal for the US National Institutes of Health, they want to see power calculations where you claim you are powered to 80 percent.
Skip to 6 minutes and 40 seconds They want to basically see what sort of effect size you could legitimately estimate or you’d expect to be able to detect with 80 percent power. That’s like the rule of thumb power. Now, in reality when people have checked the amount of power in actual designs, often empirical studies have much less than 80 percent power. Even funded experiments. Probably a pretty decent study has 50 or 60 percent power. There are plenty of studies, small social psychology or experimental economics lab studies that are like really small sample, way underpowered, where maybe they have 20 or 30 percent power.
Skip to 7 minutes and 22 seconds So probably the reasonable range in most social science research, I don’t know, making it up a little bit, is like 20 to 80 percent power. 20 is way too low, maybe 50 or 60 is pretty typical for experiments, and 80 would be like you’re pretty well-powered. If you’re more than 80 percent power, that’s a well-powered design.
Skip to 7 minutes and 42 seconds So you’re familiar with this, and then just for the thought experiment here in the Ioannidis paper, imagine there had been “c” findings in a literature. There are like “c” studies in this research area. He’s not combining in any serious way across the “c” studies. He’s not thinking about the meta analysis here. It’s like, “If I see a result that’s out there, what should I take from that result?” And he’s going to abstract away from a lot of additional problems, like file drawer problems and whatnot.
Why most published research findings are false
In 2005, John Ioannidis, well known for his research on the validity of studies in the health and medical sciences, wrote an essay titled “Why Most Published Research Findings are False.” The blunt title and Ioannidis’s provocative and compelling arguments have made this paper one of the foundational pieces of literature in the areas of meta-science and research transparency. You’d be hard-pressed to find an article on these topics – published in a journal or the popular media – that doesn’t mention it.
In this video, I introduce you to the different types of errors that can occur in research, their probabilities, and the concept of statistical power. We will also learn about Positive Predictive Value, or the believability of a study’s findings, as well as how biases can impact results. The last part of the video lays out six corollaries that characterize scientific research and what scientists can do to improve the validity of research. We go into more depth about these corollaries below.
In the article, Ioannidis lays out a framework for demonstrating:
- the probability that research findings are false,
- the proportion of findings in a given research field that are valid,
- how different biases affect the outcomes of research, and
- what can be done to reduce error and bias.
Ioannidis first defines bias as “the combination of various design, data, analysis, and presentation factors that tend to produce research findings when they should not be produced.” He goes on to say that “bias can entail manipulation in the analysis or reporting of findings. Selective or distorted reporting is a typical form of such bias”.
With increasing bias, the chances that findings are true decreases. And reverse bias – the rejection of true relationships due to measurement error, inefficient use of data, and failure to recognize statistically significant relationships – becomes less likely as technology advances.
Another important point Ioannidis makes is that, while multiple research teams often study the same or similar research questions, it is the norm that the scientific community as a whole tends to focus on an individual discovery, rather than on broader evidence.
He goes on to list corollaries about the probability that a research finding is indeed true:
Corollary 1: “The smaller the studies conducted in a scientific field, the less likely the research findings are to be true.” He refers here to sample size. Research findings are more likely to be true with larger studies such as randomized controlled trials.
Corollary 2: “The smaller the effect sizes in a scientific field, the less likely the research findings are to be true.” Also remember that effect size is related to power. An example of a large effect that is useful and likely true is the impact of smoking on cancer or cardiovascular disease. This is more reliable than small postulated effects like genetic risk factors on disease. Very small effect sizes can be indicative of false positive claims.
Corollary 3: “The greater the number and the lesser the selection of tested relationships in a scientific field, the less likely the research findings are to be true.” If the pre-study probability that a finding is true influences the post-study probability that is true, it follows that findings are more likely to be true in confirmatory research than in exploratory research.
Corollary 4: “The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true.” “Flexibility”, Ioannidis tells us, “increases the potential for transforming what would be ‘negative’ results into ‘positive’ results”. To combat this, efforts have been made to standardize research conduct and reporting with the belief that adherence to such standards will increase true findings. True findings may also be more common when the outcomes are universally agreed upon, whereas experimental analytical methods may be subject to bias and selective outcome reporting.
Corollary 5: “The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true.” Conflicts of interest may be inadequately reported and may increase bias. Prejudice may also arise due to a scientist’s belief or commitment to a theory or their own work. Additionally, some research is conducted out of self-interest to give researchers qualifications for promotion or tenure. These can all distort results.
Corollary 6: “The hotter a scientific field (with more researchers and teams involved), the less likely the research findings are to be true.” When there are many players involved, getting ahead of the competition may become the priority, which can lead to rushed experiments or a focus on obtaining flashy and positive results that are more publishable than negative ones. Additionally, when teams focus on publishing “positive” results, others may want to respond by finding “negative” results to disprove them. What results then, is something called the Proteus phenomenon, which describes rapidly alternating extreme research claims and opposite refutations.
Using his framework for determining Positive Predictive Value and the corresponding corollaries, Ioannidis concludes that “most research findings are false for most research designs and for most fields.”
While the wide extent of biased and false research findings may seem a harsh reality, the situation can be improved in a few ways. First, higher powered and larger studies can lower the proportion of false findings in a literature, with the caveats that such studies are more helpful when they test questions for which the pre-study probability is high and when they focus on broader concepts rather than specific questions. Second, rather than focusing on significant findings from individual studies, researchers should emphasize the totality of evidence. Third, bias can be reduced by enhancing research standards, especially by encouraging pre-study registration. Finally, Ioannidis suggests that, instead of only chasing statistical significance, researchers should focus on understanding pre-study odds.
After reading this, what are your reactions? Are you surprised? How, if at all, does this change your perception about research in general? How might the individual factors described in the corollaries influence each other to exacerbate bias?
Read the full essay on PLOS.org here. You can also find this link in the SEE ALSO section at the bottom of this page.
Ioannidis, John P. A. 2005. “Why Most Published Research Findings Are False.” PLoS Med 2 (8): e124. doi:10.1371/journal.pmed.0020124.
© Center for Effective Global Action