Skip to 0 minutes and 0 secondsThese are the rules, and they’re kind of tongue in cheek. Love your data, and help others love it too. Because your data is good enough and it’s strong enough and, gosh darn, people like it! Share your data online with a permanent identifier. That’s a pretty good one and data repositories will always do that. Therefore, in the published paper, or in what you share with people, they can always go back to that permanent identifier.

Skip to 0 minutes and 26 secondsConduct your research with “reuse” in mind. Expecting that when you collect data you expected to use it for one paper. You may want to use it for another paper. Someone else may want to use it for a totally different purpose. Sometimes we collect data on these variables that are just co-variates for us, or things we don’t even know why we’re collecting some of these variables. For someone else, it could be essential to their research agenda. So, take it seriously. Publish workflow as context. Meaning, provide an appendix that shows how the data was handled. That shows the timing of when you did what, as part of the research process so people can understand what you’re doing.

Skip to 0 minutes and 59 secondsMaybe learn from it, maybe critique it. Link your data to your publication. So when you publish your data, link to your data-verse site. Link to whatever repository you’ve posted your data in so other people can easily find it.

Skip to 1 minute and 14 secondsRule six: Publish your Code. Even the small bits. Even stuff you think is silly. “Oh, my data cleaning code” or “Oh, my data construction. Constructing this or that variable.” Put it all out there because there may be an error. Maybe someone can improve on it. Maybe somebody can learn from it? Maybe that’s the aspect of your work that you find trivial that they find really interesting. State how you want to get credit. This is with data. So, when you actually post data, let’s say in Dataverse, make sure you put in, “If you use this data please put in the following citation.” Make it clear how you want to get cited.

Skip to 1 minute and 48 secondsThe more people's data gets cited, the more serious they’re going to be. And that counts towards their citation count. The more serious they’re gonna be and the greater the professional rewards are going to be for actually publishing their data. Foster and use data repositories. Reward colleges who share their data properly. So maybe within your professional communities recognize people who are good and transparent with their data.

Skip to 2 minutes and 13 secondsThe last one, I don’t know what to make of it: be a booster for data science. So, promote data science. Okay. So, that's kinda nice.

Rules for open data

We know that sharing data improves the integrity of scientific research, helps other researchers who are interested in using others’ data or in replicating their studies, and may even increase the likelihood a paper will be cited. But how can researchers ensure this sharing process is efficient or present their data in a way that is useful? Astronomer Dr. Alyssa Goodman and fourteen of her colleagues came up with ten rules for open data. I briefly go through these rules in this video and elaborate on them below.


“Ten Simple Rules for the Care and Feeding of Scientific Data” was written by a group of 15 researchers who wanted to help scientists “ensure that their data and associated analyses continue to be of value and to be recognized.” Today, there is an abundance of studies that are less than reproducible or verifiable due to a lack of data availabaility or data description.

Goodman and her colleagues’ 10 rules are as follows:

  1. Love your data, and help others love it, too – If you make your data easily accessible, others are more likely to do that as well. The authors encourage scientists to “cherish, document, and publish” their data, and encourage others to follow.

  2. Share your data online, with a permanent identifier – Authors should try to deposit their data in an archive that acts as the “go to” place for their field. Having a good host for data allows it to be more accessible and long-lasting.

  3. Conduct science with a particular level of reuse in mind – The authors use the word “provenance” and its definition as “the sum of all the processes, people, and documents involved in generating or otherwise influencing or delivering a piece of information” to describe a study’s level of reusability. With better documentation, quality of provenance will be higher and there will be a higher chance of data reuse. Data reuse is most possible when data, metadata, and information regarding the processes of generating this data, is provided. Thus, scientists should plan according to the level of reuse they want their experiment to have, and adopt the appropriate standard formats.

  4. Publish workflow as context – “Publishing a description of your processing steps offers essential context for interpreting and reusing data” the authors of this paper write. Workflow is a term that describes the data collection methods and analysis of a project. While some workflow software exists, it is suggested that, at the minimum, authors should disclose a simple sketch of data flow that indicates how results were generated.

  5. Link your data to your publication as often as possibleData can include anything from tables and spreadsheets, to images and code. Regardless of what a study’s data is, the more of it and the earlier it is made accessible, the better. Scientists are encouraged to embed citations in their data and code.

  6. Publish your code – Although it may not be perfect, publication of one’s code can be important in the replication and understanding of one’s data.

  7. State how you want to get credit – Goodman et al. simply suggest making known your expectations of how you would like to be acknowledged for your data.

  8. Foster and use data repositories – It is important to find a good place to share data and code. Often, there will be an existing repository within a field,. However, if there isn’t, the authors encourage asking information specialists or librarians within that field.

  9. Reward colleagues who share their data properly – Rewarding those who share data and code, and acknowledging colleagues’ good practices, will encourage the continuation and development of these habits.

  10. Be a booster for data science – As scientists, we should all help push their institutions towards better, more reproducible research. And we should advocate for improved data sharing. We should pass on our knowledge to graduate and undergraduate students through classes and workshops so that more will see the value of “well-loved data.”

The whole editorial is worth reading and is, not coincidentally, open access. In addition to these rules, the authors give an extensive list of useful resources including open access repositories and software to help manage a more reproducible workflow.

If you want to dive deeper into the material, you can read the whole paper by clicking on the link in the SEE ALSO section at the bottom of this page.


Reference

Goodman, Alyssa, et al. (2014). “Ten Simple Rules for the Care and Feeding of Scientific Data”, PLoS Computational Biology, 10(4), e1003542.

Share this video:

This video is from the free online course:

Transparent and Open Social Science Research

University of California, Berkeley