Skip to 0 minutes and 0 seconds Replication is a central pillar of the scientific method and of the scientific enterprise. In other words, let’s say if somebody in physics comes up with a new result. They discover a particle, or they claim they have a new result, it isn’t really considered a real result until other labs can replicate it. Can do it over and over again. If it really is capturing whatever finding this lab came up with, is capturing some physical law or helping us understand some physical law, others should be able to replicate it. So, you know, the analogy isn’t perfect to the social sciences right off the bat.
Skip to 0 minutes and 41 seconds Because, let’s say you find something in one group of people or one society, it’s not necessarily clear it will always hold elsewhere. We don’t have the same social laws or psychological laws or economic laws as we think particles do in physics. So, replication is always going to be a little different in the social sciences than it is in the physical sciences. And we might, just to start out, we should be resigned to the fact that we are going to have more heterogeneity in our treatment effect estimates inherently because of differences in the sample and the setting. Differences over time. People change over time.
Skip to 1 minute and 19 seconds So even if you run the same experiment with the same people a week later, you may get a different result. So, that’s just a starting point. At the same time, the notion that a real result, a result we can believe in, is something that appears more than once in one data set or in one analysis – that still holds. If we are going to have confidence in a result, we want to believe that it holds a little more broadly than that. Hamermesh, in his article, makes a distinction between what he calls “pure replication” and “statistical replication.”
Skip to 1 minute and 50 seconds So pure replication is literally the ability to take the same data and use the exact same analysis and literally just try to do exactly what the original authors did and get the exact same results. That’s basically what pure replication is. So, you know, you get my data. Unless I’ve made an error, you should be able to run the code and get the same results. And it turns out there are errors in the code. It turns out people make mistakes. Marty Feldstein wrote a very famous paper in 1974 that got a lot of attention claiming that when social security was rolled out – this huge social program – like the biggest social program in the US.
Skip to 2 minutes and 27 seconds When it was rolled out, it contributed to a massive reduction in private savings. And you might imagine why. If I know that I’m going to get a pension later in life, I may save less now because I know I’m going to get this pension or I believe I am going to get a pension. So I just spend freely now and I save less. But of course, in the aggregate, that could have big implications for growth if we’re sort of saving less, accumulating less capital, etc. And so it got a lot of attention. And Feldstein, of course, was the head of the Council of Economic Advisors for Ronald Reagan and was pushing this kind of agenda.
Skip to 2 minutes and 59 seconds It turns out that this paper had a massive error in the construction of the main social security wealth variable. Like what you expected to get in social security benefits. And when you fix that error, the results go to zero. There was no effect that they could detect on private savings. That’s a big deal! This was like a top journal, one of the country and the world’s most prominent economist, writing something of huge public policy implications. For like six years this paper was floating around influencing policy. And if they hadn’t found the error, maybe it would have influenced policy more. Maybe, who knows what Reagan in the early ‘80s would have done if this finding was considered solid.
Skip to 3 minutes and 42 seconds Maybe they would have gone after social security. I don’t know. I mean, it’s possible.
Skip to 3 minutes and 48 seconds This is a pure replication exercise that was really important.
Skip to 3 minutes and 54 seconds The second type of replication – statistical replication, you’re testing the same hypothesis. You choose to use different methods. So if someone is critiquing a particular econometric approach in the minimum wage literature. And they say, “You know, this model – the model that’s used is just wrong. For various statistical reasons, this is wrong. I am going to use the more appropriate model.” That would be a statistical replication. We call it a “re-analysis.” That’s what we do. Now, in the other case, you test the same hypothesis in a different data set or in a different time – different setting. You might think of that as a robustness check. You might think of that as an extension.
Skip to 4 minutes and 32 seconds You might think of that as a way to test external validity. But people use a lot of different terms for that. We’ll come back to this later. But there probably is somewhat more replication research going on than we think because a lot of that work is never called “replication research.” It’s just like, “Oh, I’m going to test the effect of the minimum wage increase on unemployment in Saskatchewan, Canada rather than in New Jersey. And I’m just going to say I did a minimum wage study.” I’m not necessarily going to call it a “replication.” But it kind of is a replication. You are trying to see if this hypothesis – I’m going to test this hypothesis in a new setting.
Introduction to replication
This activity consists of a series of videos that examine the different types of replication and the inherent differences between replications in the social sciences and those in the life sciences or physical sciences. It also introduces a viewpoint article by Dr. Daniel Hamermesh that discusses the importance of replication and how the scientific community can incentivize their publications. The next two videos focus on different aspects of Dr. Hamermesh’s paper. We also go into more depth below.
Replicability and replication are very important to the process of ensuring the validity of scientific findings. Pure replications can verify that a past analysis was done correctly and statistical replications provide some sense that a study’s (or aggregated studies’) findings are valid.
In the viewpoint article “Replication in Economics,” economist Daniel Hamermesh discusses both types of replication and examines why replication is so rare in published social science literature.
Pure replications, which use the same data as previously published studies to check for errors, are disturbingly rare, at least in economics. Dr. Hamermesh sent a survey to authors of empirical studies published in two leading labor economics journals between 2002 and 2004: Industrial and Labor Relations Review (ILRR) and the Journal of Human Resources (JHR). He found that the vast majority of these authors never received requests of any kind for their data sets.
Hamermesh then gives three positive incentives for research replicability and the sharing of data:
Fear of embarrassment or retaliation by peers or providers of funding should incentivize “careful documentation and maintenance of one’s data sets”.
Trust or credibility in social science research, and thus its usefulness in policymaking, rests on replicability. Hamermesh points out that “[the incentives we face here are clear: our ideas are unlikely to be taken seriously if our empirical research is not credible, so that the likelihood of positive payoffs to our research is enhanced if we maintain our data and records and ensure the possibility of replication. … [T]he greater ease of communication worldwide may have enhanced these returns, particularly in the areas of influencing policy and stimulating students.”
Finally, the cost-benefit relationship of replications of highly visible studies, especially as technology continues to reduce the costs associated with sharing data, suggests that replications should be more common. “[T]he likelihood of somebody attempting replication rises with the visibility of the published study and its author and decreases with the visibility of the potential replicating author. Under those circumstances the benefits of replicating are greater and the costs are lower. Technology has diminished the costs of providing the materials necessary for replication at the same time that changes in the publication process in economics have increased the benefits to authors of maintaining the records that might make replication possible.”
Hamermesh also gives advice to both researchers seeking to perform replications and authors whose studies have been replicated. In general, authors should make their data and code readily available and usable; replicating authors should “take a gentle, restrained professional tone in the comment”; and replicated authors should admit mistakes honestly and swiftly.
He states that
“…given the media interest in reporting novel or titillating empirical findings and politicians’ desires to robe their proposals in scientific empirical cloth, however novel or inconsistent with prior research, it is crucial that as a profession we ensure that replication, or at least fear of replication, is our norm. Empirical economics is never going to become a laboratory science, but recognizing the role of replication can move us slightly in that direction by preventing us from propagating erroneous results.”
He also suggests that journal editors take the lead on this issue by authoring a few highly visible and high quality replications of their own in order to provide a model for future replications, as well as to normalize the practice.
Similarly rare are scientific replications (or statistical replications) that re-examine “an idea in some published research by studying it using a different data set chosen from a different population from that used in the original paper.” Such replications are extremely important to the external validity of studies. Hamermesh asserts:
“By far the most important justification for scientific replication in nonexperimental studies is that one cannot expect econometric results produced for one time period or for one economy to carry over to another. Temporal change in econometric structure may alter the size and even the sign of the effects being estimated, so that the hypotheses we are testing might fail to be refuted with data from another time. This alteration might occur because institutions change, because incentives that are not accounted for in the model change and are not separable from the behaviour on which the model focuses, or, crucially, that even without these changes the behaviour is dependent on random shocks specific to the period over which an economy is observed.”
Furthermore, “[i]f our theories are intended to be general, to describe the behaviour of consumers, firms, or markets independent of the social or broader economic context, they should be tested using data from more than just one economy.”
Within-study replications, or studies that use multiple datasets, either from temporally or geographically distant sources, are currently our best bet for scientific replication in the social sciences since there are few incentives for journal editors to publish externally replicated research as “the profession puts a premium on the creativity and generality of the idea, not on verifying the breadth of its applicability.” Hamermesh concedes, however, that “the incentives for doing within-study scientific replication are non-existent.” Thus, he declares that “it is crucial that editors of the leading journals tilt the publishing process a bit more in favour of within-study scientific replication.”
What are the incentives for a researcher to replicate another’s data, code, or study?
You can read the full paper here.
Hamermesh, Daniel S. 2007. “Viewpoint: Replication in Economics.” Canadian Journal of Economics/Revue Canadienne D’économique 40 (3): 715–33. doi:10.1111/j.1365-2966.2007.00428.x.
© Center for Effective Global Action