Want to keep learning?

This content is taken from the University of California, Berkeley, Center for Effective Global Action (CEGA) & Berkeley Initiative for Transparency in the Social Sciences (BITSS)'s online course, Transparent and Open Social Science Research. Join the course to learn more.

Skip to 0 minutes and 0 seconds This is the Lawrence Sanna piece which ended up being retracted and there’s a particular hypothesis. I’m an outsider to this field. Doesn’t seem like a, like staggeringly plausible hypothesis to me. Does higher physical altitude lead to more pro-social behavior because individuals think of some analogy between the moral high ground and sort of the physical high ground? That’s the kind of underlying idea. So what ended up happening is participants were in a kind of theatre somewhere, led up onto the stage and asked to make decisions. Others were led down into the orchestra pit to make decisions, and other stayed at the same level.

Skip to 0 minutes and 37 seconds And the idea is somehow that if you’re up on high, you will behave in a more magnanimous, pro-social fashion. This is the task. And the way you’re more pro-social – and again you’re making food for other participants, and if you use less hot sauce, that means you’re being nicer to them. So this is the study. And if you use more hot sauce and you’re less pro-social. I don’t know what kind of hot sauce it was. You know here in California we care about that.

Skip to 1 minute and 11 seconds So there are different treatments. There’s the high ground. There’s the control. There’s the low ground and then there’s different types of behavior that’s being studied. And this is a reproduction of the table in the original paper. Again, this is the actual height. High, Low, Control. And then these are the different outcomes. And he’s going to focus on these three. This is the grams of hot sauce that you used. So you know this is your compassion. If you take the moral high ground, you use fewer grams of hot sauce than if you were down in the orchestra pit. So, this is the idea.

Skip to 1 minute and 46 seconds And well you can see, and this is what Uri says caught his eye, is that if you look at the variability in the data, these are standard deviations. This is the sample standard deviation. They look really similar for pretty small samples. And then the same thing for these, and then the same thing for these down here. And so his point is, wow, those look really similar like sample standard deviation. Like a sample standard deviation is itself a random variable, that has a standard deviation. That has a sort of expected amount of variability. And with a relatively small sample, that random variable, the sample standard deviation, should be expected to have some spread, and these don’t have much spread.

Skip to 2 minutes and 40 seconds So what does he do? At first he didn’t have the raw data. So he basically assumes the outcome is distributed in a certain way. Assumes it’s distributed normally as a starting point. Later on he’s going to get the raw data. So he doesn’t have to even worry about that. He says, “Okay, I kind of know what the mean is in each arm.” He’s going to compute the pooled standard deviation across the three arms. Kind of imposing some artificial commonality there, doesn’t even know if that’s true. And take a hundred thousand simulated draws of the data. And get a sense for what the distribution of standard deviations, of sample standard deviations looks like. That’s the idea. So it’s actually pretty straightforward.

Skip to 3 minutes and 24 seconds And then he’s going to compare the standard deviation in the paper to this distribution of sample standard deviations that he simulated. So what does he find? He has these hundred thousand simulations. And he simulates the distribution of sample standard deviations. And they have a nice shape as we’d expect them to have. And it turns out that the Sanna et al. piece is way, way, way, way out in the tail. So the p value here, this is pooling across those three different outcomes. The p value is 0.00015. 15 out of 100,000 cases. So it seems really unlikely that these data are randomly generated from random samples.

Skip to 4 minutes and 16 seconds That’s the first conclusion. Then he received the raw data. And after receiving the raw data, he can do the same thing. Taking draws, basically random samples with replacement from the existing data to create the same distribution effectively, of sample standard deviations. So he does that, and you get almost the same p-values, and again it’s extremely unlikely. So, and in many ways this is preferable, this approach, in case the outcome isn’t normally distributed. In case there’s some weird correlations between different outcomes. Like you’re going to capture it, and when you’re taking draws from the real data, these draws with replacements. So this is a preferred method, and you get the same answer.

Skip to 5 minutes and 1 second What Simonsohn then does next is he compares this kind of standard deviation of sample standard deviations. I love it, in this study to other published studies that have commonalities and use a common outcome measure, that look at similar things. And so what he actually looks at is studies that cite the same papers that the Sanna et al. piece cited, like kind of studies that are being drawn from the same literature. And he compares the SI. SI is the standard deviation of sample standard deviations. And the Sanna et al. results are clear outliers. So this is the SI Term on the left hand side here for these three kind of sub-studies, arms two, three and four of Sanna.

Skip to 5 minutes and 46 seconds And you can see there’s just very little variation in the sample standard deviations. But then there’s other types of studies that have commonalities. For instance, there are studies that use hot sauce. And there’s a hot sauce sub-literature. And in that sub-literature there tends to be a pretty large SI. SI is, you know, 100 to 200 percent, not 12 percent. I mean just like order of magnitude off in the amount of variability. Again, like a bad cheater was Sanna. There’s other studies, time on some unsolvable task. There’s other studies that are kind of public good studies contributing to a common research.

Skip to 6 minutes and 26 seconds They all have tons of variability, and across all these studies, none of them are anywhere close to Sanna et al. It just looks like this is not real data. Simonsohn then decided to look at several of Sanna’s other papers and all of them had this property of too little variance in sample standard deviations. So it was a systematic pattern. Kind of looked like he was sort of making up his data probably the same way.

Skip to 6 minutes and 52 seconds And then if he looks at other statistics. So, for instance one statistic he looks at is the difference between minimum and maximum values in a sample. It turns out that’s a really volatile characteristic. But the difference between the max and the min, in all these arms is almost exactly the same. Just very weird looking, weird looking data. So, he kept looking and he kept finding funny patterns. He was almost certain that the data had been made up. And since then a large number of publications, I’m not sure if it’s all of Sanna’s publications but a large number have been retracted, he’s resigned, etc.

How open data can facilitate the detection of fraudulent research

In 2013, Uri Simonsohn noticed some abnormalities in a published study by Lawrence Sanna and his colleagues. The study, which claimed to answer the question “Does higher physical altitude lead to more prosocial behavior?,” seemed to have very little variability for such a small sample size. In this video, I’ll take you through Simonsohn’s technique for determining that it was highly unlikely that the data presented in this paper were from a real experiment. The results of Simonsohn’s study were so striking that it led him to re-visit many of Sanna’s other papers. Watch the video to see what he found.

Share this video:

This video is from the free online course:

Transparent and Open Social Science Research

University of California, Berkeley