"Investigating variation in replicability" by Richard Klein, et al.
“Replication is a central tenet of science; its purpose is to confirm the accuracy of empirical findings, clarify the conditions under which an effect can be observed, and estimate the true effect size,” begin the authors of “Investigating Variation in Replicability.”
The benefits of replication may include:
“[f]ailure to identify moderators and boundary conditions of an effect may result in overly broad generalizations of true effects across situations… or across individuals,”
“overgeneralization may lead observations made under laboratory observations to be inappropriately extended to ecological contexts that differ in important ways,”
“attempts to closely replicate research findings can reveal important differences in what is considered a direct replication… thus leading to refinements of the initial theory,” and
“[c]lose replication can also lead to the clarification of tacit methodological knowledge that is necessary to elicit the effect of interest.
However, replications of published studies ar remarkably scarce in social science journals. One of these reasons is that researchers, especially junior ones, fear retaliation from more senior and published scientists, as well as limited interested from journal editors to publish replications.
In an attempt to address these concerns, Richard Klein and 34 of his colleagues in the psychological sciences took on a “Many Labs” replication project. Their goal was to systematically and semi-anonymously evaluate the robustness of published results in psychology journals, as well as to “establish a paradigm for testing replicability across samples and settings and provide a rich data set that allows the determinants of replicability to be explored” and “demonstrate support for replicability for the 13 chosen effects.”
They selected 13 effects that had been published in behavioral science journals and attempted to replicate them across 36 labs and 11 countries, controlling for methodological procedures and statistical power. The effects included sunk costs, gain versus loss framing, anchoring, retrospective gambler’s fallacy, low-versus-high category scales, norm of reciprocity, allowed/forbidden, flag priming, currency priming, imagined contact, sex differences in implicit math attitudes, and implicit math attitudes relations with self-reported attitudes.
“In the aggregate, 10 of the 13 studies replicated the original results with varying distance from the original effect size. One study, imagined contact, showed a significant effect in the expected direction in just 4 of the 36 samples (and once in the wrong direction), but the confidence intervals for the aggregate effect size suggest that it is slightly different than zero. Two studies – flag priming and currency priming – did not replicate the original effects. Each of these had just one p-value < .05 and it was in the wrong direction for flag priming. The aggregate effect size was near zero whether using the median, weighted mean, or unweighted mean.”
They also found very few differences in replicability and effect sizes across samples or lab settings and conclude “that most of the variation in effects was due to the effect under investigation and almost none to the particular sample used.”
This study is just one of a handful of “Many Labs” replication projects that began in 2011 at the Center for Open Science. These projects have continued to attract interest from researchers and publishers alike. Read more about the project and find other replications at https://osf.io/ezcuj/
You can read the whole paper by clicking on the link in the SEE ALSO section at the bottom of this page.
Klein, Richard A., Kate A. Ratliff, Michelangelo Vianello, Reginald B. Adams Jr, Štěpán Bahník, Michael J. Bernstein, Konrad Bocian et al. “Investigating variation in replicability.” Social Psychology (2014).
© Center for Effective Global Action