Skip to 0 minutes and 0 secondsSo far we’ve been saying, “Ah researchers, they don’t wanna share their data. They’re being selfish. They wanna get all the publications out of it themselves.” But maybe there’s something they’re not considering. And maybe the thing they’re not considering is, actually, if you posted your data, it might actually be good for you. And not just good for you because your university might condition getting tenure on you being a proponent of open data. That would be different. That would be like a regulation. This would just be like, in the sort of like “sphere” of the debate over ideas and the intellectual community, you will have a higher profile if you share your data with the research community.

Skip to 0 minutes and 36 secondsNow, if that’s true then it’s sort of like: Hey you’re being dumb! Put the time in and share your data. You’re gonna get more citations. People are going to build on your work. And maybe you’re being shortsighted by not sharing your data. So, maybe people are being lazy and they’re not assembling the replication code. But if they put the effort in it would actually be to their benefit because other people could build on what they have been doing.

Skip to 0 minutes and 59 secondsSo let’s talk about the Piwowar and Vision piece. And the goal of this piece is to test this question. Does open data, does making your data public increase your citation cap? So what did they do? They assemble 10 thousand papers, again from Biology, on gene expression microarray data. That sounds really cool. I have no idea what it is. And they classified the availability of data for these different studies. Does making your data available increase later citations? And they find that it does. They find that on average articles that had the data publicly available, got 9% more citations.

Skip to 1 minute and 43 secondsNow, it turns out that papers that have just been published in the last few years don’t see any of this sort of benefit. Sort of seems to appear over time. So, once you’re like 5, or you know, 5, 6, 7 years ago, then the citation increase for papers with publicly available data, is 30%.

Skip to 2 minutes and 6 secondsAnd they have evidence. A lot of the datasets are reused by others. So this, if it’s real, provides a professional incentive to post your data. Now, the problem with this is that these studies may be different. People who post their data may be different than those who don’t post their data. For instance, if people request my data, it may be because they’re actually interested in my study. Like they’re citing my study. They may be more likely to request it and I may be more likely to post it. If that’s the case, this correlation is problematic. Now, they have said they have 124 different characteristics on all these papers. They control for all these things.

Skip to 2 minutes and 50 secondsThey control for the site count of the lead author and they control for the exact subfield and all these other things. But to the extent there’s still some kind of effect which is like, I post my data when there is demand for my data. And there is demand for my data when I have a good paper that’s being cited. Then this correlation is harder to interpret. Like, you want some exogenous push in data posting. Because of a journal requirement or some other requirement. I don’t know, maybe I’m being too cynical. I just don’t know how convincing, at the end of the day, it is. It’s suggestive.

Skip to 3 minutes and 24 secondsIt’s like a neat “finding” and it’s cool that they're doing this kind of research. But I think we probably need a little more research design. Meaning, some sort of quasi-experimental or experimental variation, to be convinced.

If you post your data, it might actually be good for you!

A 2013 study by Dr. Heather Piwowar and biologist Dr. Todd Vision suggests that sharing data isn’t just good for the scientific research community as whole, but for individual publishing authors as well. They found a positive correlation between posting data and increased citations. This video discusses their methods, as well as why Dr. Miguel isn’t entirely convinced this provides a strong incentive to share data.

Share this video:

This video is from the free online course:

Transparent and Open Social Science Research

University of California, Berkeley