Want to keep learning?

This content is taken from the University of California, Berkeley, Center for Effective Global Action (CEGA) & Berkeley Initiative for Transparency in the Social Sciences (BITSS)'s online course, Transparent and Open Social Science Research. Join the course to learn more.

Skip to 0 minutes and 1 second Second paper is a Smeesters paper. And again we’re just going to look at a different statistic, but we’re going to look for two little variants. The same kind of idea. This is a different social psychology experiment. The conventional wisdom in social psychology is certain colors evoke certain moods or states of mind, that then produce certain behaviors. So, the idea is the color red makes people break with the rules. Blue make them sort of acquiesce. And the color white means they kind of follow the rules when they’re supposed to, and they kind of deviate where they’re supposed to. That was my understanding from reading the article. That certain colors evoke certain things.

Skip to 0 minutes and 42 seconds So they basically had subjects answer questions that were very closely linked to stereotypical attributes of certain types of people by gender, by demographics, etc. But they handed them the instructions before they filled in this survey in different color folders. And the idea is if you hand someone a red folder, they’re going to tend to break with the stereotypes, and with blue they’re going to go along with them and whatnot. So, turns out there were these three different color conditions, and then for different types of tests. There’s twelve arms. There’s three colors. There’s four outcomes. The theory predicts for six of them, you’re going to answer a lot of the twenty questions you’re asked in the affirmative.

Skip to 1 minute and 41 seconds And for six of those twelve you’re going to answer fewer of them in the affirmative. Because the color is going to push you towards the affirmative, let’s say, if you get blue. But it’s going to push you away from answering yes if you get red. So that’s the prediction. Twelve arms, six predicted high, six predicted low.

Skip to 2 minutes and 3 seconds Okay, so we have the six low predicted arms on the left. The six high predicted arms on the right. This is yes answers out of twenty questions. And the thing that caught Simonsohn’s eye here is all the low predicted arms have almost exactly the same mean, they’re 9 point something. And almost all the high predicted arms have almost the exact same mean between 11.4 and 12 out of 20. Now these were actually somewhat different questions. There were some about sort of gender stereotypes. There was a picture of Albert Einstein in one of them. If you remember, it’s like why those treatments would generate exactly the same response, even if they’re predicted to have the same sign?

Skip to 2 minutes and 51 seconds You know, sometimes we test, this is a little bit of the obsession, in a lot of the social sciences with rejecting the null. So some sort of predict you’d have a positive effect. Some predicted you’d have a negative effect. But as far as I can tell in the description in this paper, none of them make very clear predictions about the exact magnitude of the effect. But somehow they have exactly the same impact, just seems totally bizarre. So just twelve out of twelve exactly nailed it. You know it’s like you just keep hitting shots from mid-court, as a basketball analogy. You just hit twelve in a row every time. Just not going to happen. Okay, so this was the concern.

Skip to 3 minutes and 30 seconds How likely is it given a certain distribution, given that the outcome, let’s say is normally distributed with a certain mean and standard deviation. How likely is it that those six means are going to be that close together? Same thing for the high conditions. So, again it’s the same kind of idea. He’s going to use the summary statistics in the article himself. He’s actually going to impose the same distribution for all the lows, and then a separate same distribution for all the highs. He’s rigging the data to look more similar than it probably is. And despite rigging the data as much as possible, when he takes his 100,000 draws of a normal with those characteristics.

Skip to 4 minutes and 9 seconds He finds it’s just incredibly unlikely you would get this pattern, 0.00021. So this is 21 out of 100,000 cases again. This wasn’t even close. And this is the kind of equivalent plot. What’s the likelihood that the average difference in means is the similar, way out in the tail?

Skip to 4 minutes and 38 seconds It’s just so implausible, the patterns that were documented, that they were generated by any real data. That basically the ethics committee at Erasmus University concluded he must have committed fraud. The other thing was in all these cases there was this pattern. It came out in the Staple case. It comes out in this case, that the accused wouldn’t share their data with other people. They wouldn’t share their data with their research assistants. They wouldn’t sort of share their data with their co-authors. And that’s sort of like a common problem or pattern. So, I think the bottom line is, it’s just hard to fake data and make it look real. Maybe people could do better than these folks.

Skip to 5 minutes and 16 seconds But there’s just a lot of statistics you’d have to fake. So here’s another one you’d have to fake to get it right. Another Smeesters study was a willingness-to-pay study. Something that economists love, and people in other field. And this is willingness-to-pay for different types of T-shirts. But it turns out when you ask people for willingness-to-pay data, they bunch up on multiples of five. And that’s just like everybody knows. Like anybody who’s looked at real world willingness-to-pay data or valuation data sees these kinds of patterns. In low income countries where a number of us in the room collect data, ages are also clubbed at five.

Skip to 5 minutes and 57 seconds If you ask old folks in rural Malawi their age, it’ll be a 50’s and a lot of 60’s, and not very many 67 and a halves. There’s going to be this bunching. So he plots this data out again, there’s Smeesters again. You know in one of these studies exactly 20% of observations were multiples of five. In every other real world study, it’s 80% or multiples of 5. Because everybody will go, “Oh I’ll pay you five bucks, I’ll pay you ten bucks for it.”

Skip to 6 minutes and 28 seconds And another one Smeesters got a little wise. And it was 25% or something like that. But it’s like yet another dimension of the data that just is like a complete outlier. I don’t even know how much time Simonsohn spent on paper after paper getting the code, creating these figures.

Skip to 6 minutes and 50 seconds Because he also writes his own papers as we’ve seen. But he did a real service for the field of social psychology in just documenting how pervasive these problems were. That basically well-established faculty in multiple countries, these scholars were almost certainly making up their studies. Okay, so what does Simonsohn say next? He provides some advice. Before you jump into this, be really sure. Get a lot of data. Get a lot of evidence from multiple studies before you even raise this And that’s actually good advice, because you know this ended the career of prominent scholars. And so if someone were to raise concerns based on

Skip to 7 minutes and 33 seconds the sort of spurious evidence and sort of ruin the career of someone who didn’t commit fraud, that would obviously be very costly. He also contacted the authors and asked them for data. He was very transparent about his concerns and whatnot. And then this is probably the most important point. The last one is if you do have evidence along these lines, approach the relevant authorities, investigative authorities and their institution. Like don’t go to the media. Don’t go to bloggers. There is actually a mechanism. Academic institutions and research institutions have mechanisms for dealing with this.

Another example of detecting fraudulent research using open data

Simonsohn also took a closer look at another study with data that looked too good to be true. In 2011, Dirk Smeesters published an article in the Journal of Experimental Social Psychology that indicated that priming subjects with instructions presented in different colored folders leads to their acquiescence of stereotypical behavior. As in Lawrence Sanna’s case, Smeesters’s data seemed to have too little variance to be true. Simonsohn’s study led Erasmus University (where Smeester’s worked) in Rotterdam to begin an investigation that ultimately led to his resignation. In this video I discuss how Simonsohn uncovered this fraud, as well as some advice he has for researchers interested in performing similar analyses.

Share this video:

This video is from the free online course:

Transparent and Open Social Science Research

University of California, Berkeley