Contact FutureLearn for Support
Skip main navigation
We use cookies to give you a better experience, if that’s ok you can close this message and carry on browsing. For more info read our cookies policy.
We use cookies to give you a better experience. Carry on browsing if you're happy with this, or read our cookies policy for more information.

Skip to 0 minutes and 0 secondsGerber and Malhotra, a useful starting point. They went out and they got data from a couple of leading political science journals over a long period of time. They look at the leading statistical tests in those papers. And they just do something really simple and say, "What does the distribution of p-values look like regardless of what assumptions you make?" There's no way they should look like these guys are going to show they look like. There's just going to be some really weird patterns, and this is something that comes out in literature after literature. And they're really going to focus in on this critical value of .05, 95 percent significance, just because there's so much–

Skip to 0 minutes and 45 secondswhat's the right word – it's like fetishistic, this like obsession with 95 percent significance, and there's a feeling that a lot of researchers have that if they don't have a result that isn't significant with P less than .05 it's unpublishable, it's irrelevant, no one will be interested in it. So they're going to focus on this. There are a couple of reasons why you would have bias and only certain results would get published. One is editors and referees. Editors and referees may just have this notion that only results that are significant in 95 percent are worth publishing.

Skip to 1 minute and 18 secondsGiven that, authors may only submit results that have a certain pattern of significance or have p-values less than .05.

Skip to 1 minute and 28 secondsSo maybe there's no manipulation going on, but there's the file drawer problem. So there's the sort of so-called cross-study bias problem. Meaning, I have a whole bunch of results, I only send the ones off that are significant. The others stay in my file drawer. Then there's the possibility of manipulation in a given dataset with given data. So actually my result isn't significant, there really is no effect, and I manipulate the data to get a p-value just less than .05. This is a real quote. You guys should just take a minute and read it. This is a quote that Jeremy Weinstein, a professor of political science at Stanford presented a couple of years ago in a conference.

Skip to 2 minutes and 6 secondsMaybe you guys can just take a second and read this quote. This is from a real referee report. He won't say who got it, if it's him or somebody else, but he swears this is a real letter from a journal.

Skip to 2 minutes and 25 secondsSo basically what the editor is saying is the paper, gosh, this is a pretty important question, that's good.

Skip to 2 minutes and 35 secondsThey get at a causal impact, that's great. It's a pretty important question and there's some causal evidence on it. But the lack of results–

Skip to 2 minutes and 48 secondsthat's what this means – the lack of results, meaning not everything is significant as we hoped really weakens it. What you really need to do is generate new results by looking at some subgroups, looking for some heterogeneity, so I can actually say there's a result in this damn paper and publish it in – this was either AJPS or APSR or something like that. So this is a couple years ago. This is normal. This stuff happens. And it's sort of sad. You might think papers would get judged based on the quality of the question, the quality of the design, the quality of the data, the importance of the finding.

Skip to 3 minutes and 27 secondsAnd if you find there's no effect of something that theory says should have an effect, you might even think that's more interesting. Holy cow, like we're actually learning something here, we're not just confirming our priors. Anyhow, this is a concern, this is now, this is sort of what we're up against. So let's just turn to the political science literature. This is a histogram of z-statistics for the point estimates in these papers published over 13 years in American Political Science Review and American Journal of Political Science. No one had done this before in political science. There had been a number of papers looking at these kinds of distributions of p-values in other fields, including econ and medical research.

Skip to 4 minutes and 10 secondsApparently it was quite novel in political science.

Skip to 4 minutes and 15 secondsAnd they asked the question, is their smoothness– if you look at the distribution of p-values – are they smooth around .05? And again, any normal occurring distribution of p-values would be smooth around .05. And they reject smoothness at the 1 in 32 billion level. So what does that mean? And these are z-statistics, so, you know, the key point is going to be here at 1.96 or 2.0, right?

Skip to 4 minutes and 44 secondsSo these are the more significant results over here on the right; less significant on the left. The results with z-statistics less than 1.96 are not significant at 95 percent. These are significant at 95 percent. And there's this incredible jump at that point.

Skip to 5 minutes and 3 secondsThere are three times as many studies here as here.

Skip to 5 minutes and 8 secondsThere's another paper by the same two authors that came out the same year. They basically did this for political science and sociology. They wanted to just get articles in the leading journals, get hundreds of articles over a decade and plot this out, and they find the same thing. So now instead of 3 to 1, it's 2 to 1 in the sociology journals. Here somehow visually it seems particularly, again, stark,

Skip to 5 minutes and 30 secondswhere you have this incredible jump. So for some reason papers get published with p-values of .049 but not .051, and that's some combination of those three factors we talked about– editors, file drawer problem, not sending things out, and data mining.

Do statistical reporting standards affect what is published?

Publication bias can also explain why false-positives are so common in peer-reviewed journals. There is a widespread perception among journal editors and reviewers, as well as authors, that only results with at least 95% significance – or a p-value less than 0.05 – are worth publishing. In 2008, political scientist Alan Gerber and political economist Neil Malhotra reviewed the reported observations of significance just above and below 95% in articles published in two leading Political Science Journals, APSR (American Political Science Review) and AJPS (American Journal of Political Science).

As you watch this video and learn about their findings, ask yourself: what does such a bias mean for studies that produce “insignificant” results, but may nonetheless ask important questions and use rigorous methods?

Share this video:

This video is from the free online course:

Transparent and Open Social Science Research

University of California, Berkeley