Want to keep learning?

This content is taken from the University of California, Berkeley, Center for Effective Global Action (CEGA) & Berkeley Initiative for Transparency in the Social Sciences (BITSS)'s online course, Transparent and Open Social Science Research. Join the course to learn more.

Available data makes for more credible science: two articles on open data

Nowadays, many policies are being put into place that require research to be accessible to anyone through public archives. Political Scientist Allan Dafoe, author of “Science Deserves Better: The Imperative to Share Complete Replication Files,” advocates for replication transparency, saying that “good research involves publishing complete replication files, making every step of research as explicit and reproducible as is practical.”

Unfortunately, many authors do a poor job keeping their data well-preserved and it is too often lost. Dafoe’s paper simply argues that, with transparency and publication, “political science will become more refutable, open, cumulative, and accessible.” Without transparency, fraud threatens to reduce the public’s trust in science.

In “The Availability of Research Data Declines Rapidly with Article Age” by Timothy Vines et al. the authors also defend the importance of data transparency through an analysis of the effect of article age on data availability. The study formally investigated the relationships between a published paper’s age and four other probabilities:

  1. the probability of finding at least one working e-mail for a first, last, or corresponding author in order to request data;
  2. the conditional probability of a response, given that at least one e-mail appeared to work;
  3. the conditional probability of getting a response that indicated the status of the data, given that a response was received; and
  4. the conditional probability that the data were extant, given that an informative response was received.

The authors found a negative relationship between the age of the paper and the probability of finding at least one apparently working email, either through the journal or searching online. In fact, for each additional year, the chances of finding a working email falls by 7%. Additionally, there was a “negative relationship between age of the paper and the probability of the data set being extant (‘shared’ or ‘exists but unwilling to share’).” And, with each additional year after publication, the odds of data being extant decreased by 17%. Finally, they found a slightly positive effect of article age on working emails found via web searches. Data from older studies tended to not be available mainly because data sets were lost or stored in inaccessible media like Zip or floppy disks. Restoration of these data using modern computer infrastructure, therefore, would take an excessive amount of time.

Because of data’s potential usefulness in studies performed long after collection, the authors advocate for data preservation in public archives where it cannot be lost or withheld by authors.

These articles demonstrate how imperative data availability is for maintaining scientific credibility, both within the research community itself and in the public eye. In an effort to facilitate a transition toward more open data, Allan Dafoe makes various recommendations for how to produce good replication files:

For Statistical Studies:

  1. Do all data preparation and analysis in code.
  2. Adopt best practice for coding, including clarity in code, testing and running code all the way through.
  3. Build all analysis from primary data files.
  4. Fully describe variables.
  5. Document every empirical claim.
  6. Archive your files.
  7. Encourage co-authors to adopt these standards.

For Journals:

  1. Require complete replication files before acceptance.
  2. Encourage high standards for replication files.
  3. Implement replication audits.
  4. Retract publications with non-replicable analyses.

Data sharing and transparency are scientific public goods, benefiting many and lowering the barrier to entry for students and junior researchers. Open science provides tools, incentivizes caution in study designs, and can produce much more credible research.

Why don’t you think scientists and researchers make more of an effort to preserve their data after publication?

If you want to dive deeper into the material, you can read the entirety of both papers by clicking on the links in the SEE ALSO section at the bottom of this page.


Dafoe, Allan. 2014. “Science Deserves Better: The Imperative to Share Complete Replication Files.” PS: Political Science & Politics 47 (1): 60–66. doi:10.1017/S104909651300173X.

Vines, Timothy H., Arianne YK Albert, Rose L. Andrew, Florence Débarre, Dan G. Bock, Michelle T. Franklin, Kimberly J. Gilbert, Jean-Sébastien Moore, Sébastien Renaut, and Diana J. Rennison. 2014. “The availability of research data declines rapidly with article age.” Current biology 24 (1): 94-97.

Share this article:

This article is from the free online course:

Transparent and Open Social Science Research

University of California, Berkeley