Skip to 0 minutes and 0 secondsSo what do these folks do? They basically are very worried about the "u" term, "Researcher Degrees of Freedom." In other words, once you have data in hand, stuff you can do to make an effect significant that is really a null effect, that's really a zero.
Skip to 0 minutes and 18 secondsAnd that leads them to believe that for any given research study that they read they're pretty skeptical about whether it's a real effect. So this is their main table, and they're interested in the likelihood of obtaining a false-positive. So this is just 1 minus the PPV. The PPV is like what are the odds that when I see a result it's real. And this is sort of the opposite– not exactly, but when I see a result, is it a false-positive, basically. Kind of like 1 minus the PPV. The wording isn't exactly the same, the denominators are different in those two, but let's just say it's roughly 1 minus PPV.
Skip to 0 minutes and 58 secondsAnd they look at researcher degrees of freedom in four cases, which are listed out here. Now, remember, this data is pure noise. They've generated noise. So anything – and we're going to focus on this P less than .05 column, so standard levels of significance. Given that it's noise, the chance that we reject the null for all of these cases should be exactly or close to .05, other than some sampling variation. If it was .04 I'd be happy. If it was .06 I'd be happy. But it shouldn't deviate much from .05.
Skip to 1 minute and 36 secondsSo what are the different cases that they look at? The first is, what if you just had two outcome variables? Very typical situation when you're, say, doing fieldwork or you're in the lab. You collect a couple different measures of the same thing. And they say, what is the researcher degrees of freedom here? They're actually going to allow these things to be pretty highly correlated. They're not just noise. They're two correlated measures at .5 of the same thing. If I can focus on one as my main outcome and sort of ignore the other, if I can focus on measure B rather than measure A, or if I can take the average of the two.
Skip to 2 minutes and 12 secondsThose are the three things I can do with the data. If I can do those three things with these highly correlated outcome measures, I've doubled the chance I can reject the null.
Skip to 2 minutes and 23 secondsCase two, and this is really, again, this is where we're more in like the lab world. Can I collect ten more observations for a given cell? So I'm in the lab, I'm looking at my data as it comes in. I'm like, "This is looking pretty interesting, I want a little more data for treatment 2, I think that one's kind of interesting." If I can do that and get 10 more observations per cell, just 10 more observations, I'm already from 5 to 8 percent. And apparently – and he has some data in the paper here – 70 percent of lab social psychologists do this. They kind of look at their data and decide which treatments to get more data for.
Skip to 3 minutes and 1 secondSo already we can't really believe the p-values there. Three is really another useful one. What if I just have one covariate, gender? And again, it's correlated with the outcomes through chance here, it's all noise. What if I have one covariate, so I can either control for the covariate or not, or I can focus on the interaction between treatment and the covariate, just to look at subgroups. Those are the things I could do. Will any of those yield a P greater than .05? I'm at 11.7 percent. Just one covariate. People have dozens or hundreds of covariates.
Skip to 3 minutes and 43 secondsAnd then the last one here, there were three experimental conditions– what if I could just sort of exclude the data from one of them and sort of not report it? And say, "Oh, that treatment didn't work." And that happens all the time, again, in field experiments and lab experiments. People say, "Ah, that just didn't kid of work." Again, I have a much higher p-value. So any of these very limited things. 10 observations, 1 covariate, 2 outcomes lead to misleading p-values. But what gets really scary in this exercise is the ability to combine them.
Skip to 4 minutes and 17 secondsSo if I can do all four of these things– add a covariate, look at one additional outcome measure, add a few observations – things that are totally within the bounds of normal practice in a lot of empirical fields across the social sciences– the odds that I'm going to find at least one significant effect is 60 percent now. Simple solutions to false-positive publications. These are for authors.
Skip to 4 minutes and 46 secondsAuthors have to decide the rule for terminating data collection before data collection begins and show it in the article. So no more of this "I'm going to add 10 or 20 observations." They're going to try to tie researchers' hands on that margin. You have to have at least 20 observations per cell or give some reason why you didn't. So no more of these "I'm going to have 10 or 12 observations," which in lab experiments in social psychology and even some branches of experiments, like economics happens, like massively underpowered analyses.
Skip to 5 minutes and 14 secondsFor me as an outsider to this literature, I've always asked myself, and I think that's part of the motivation for what they're doing, why don’t these folks do like half as many experiments and have twice as big a sample size to actually say something definitive? But of course if you're in a world of false-positive results, that's the last thing you want to do. You want to have the possibility of false-positives, and big samples will kill all your zero results. So you want small samples and you live on the sampling variation and publish off that. Isn't that terribly cynical? But that's basically what they're saying in this article. Authors must list all the variables they collected.
Skip to 5 minutes and 50 secondsYou can't just not tell us that you collected another proxy for the same thing, that ex-ante was just as good as the one you published. You have to tell us all three of them or four of them or two of them. Authors have to report all the experimental conditions. These are just tying their hands against all the things they were warning about in the previous table. If you had a certain treatment in this experiment, tell us about it and show us the data. Don't exclude it without telling us about it. If observations are eliminated, they're outliers or for some other reason, you have to report what the results are if you include the data.
Skip to 6 minutes and 22 secondsSo this is the beginning of their robustness. If you stop dropping data, you may have a good reason for doing so, but I still want to see the full sample results. Yes, you can argue and make the case for dropping them, but don't hide it.
Skip to 6 minutes and 34 secondsAnd again, same exact thing: if you run analysis with a covariate, I want to see the unadjusted analysis. And again, there may be very good reasons for having the covariate. You make that case, but I want to see the unadjusted one. And I'm going to believe your result a lot more if it holds in both cases 5 and 6, and that's what a robustness table really is.
In this video, we explore how flexibility, common in the collection and analysis of data, can increase the likelihood of resulting in a false-positive. Joseph Simmons, Leif Nelson, and Uri Simonsohn, the authors of “False-positive Psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant”, demonstrate this using experiments and computer simulations, and also give simple guidelines that researchers and reviewers can use to reduce the incidence of these errors.
Find the full paper in the next step.
© Center for Effective Global Action