Double-dipping your data

Jan 10 2012 Published by under Uncategorized

So there I was, just kicking back in my office and reviewing a manuscript like a good citizen of the scientific community. I liked this paper, which is rare--it was straightforward and didn't try to be too fancy, and seemed like it was headed for an accepted with minor revisions fate--which is REALLY rare. But then, a record scratched--I sat up in my chair and re-read what I'd just seen. "The data from Fig X were previously published in Myself & TheOtherGuy 2011, but we include it here for direct comparison with the current data."


In all of my brief but illustrious reviewing career, I don't think I've ever seen that before. It's certainly possible some manuscripts that crossed my desk have included recycled data, but I'm pretty sure I've never had an author spell it out like that before. Is that even OK, I wondered? Naturally, I didn't keep my thoughts inside my brain, but took them to the twitterz--and what a response I got! You guys and your opinions. I'm too lazy to make a storify for this, but here are some screen shots of your responses. As you can see, most of you gave a conditional "yes," but a few emphatic "no!"s in there as well.

So, lovely non-twittering readers, what do you think? Can you publish your data twice? Have you ever? Did your reviewers comment on it? Comment away!

24 responses so far

  • Bob O'H says:

    I don't see a problem with this, as long as (a) they're clear about it, and (b) there is something new. I'd like to know what the reasons for the "no" given by some people.

  • Bashir says:

    Depends on what you mean by data. What you're actually doing is publishing data from the same experiment twice. If you experiment yields rich enough data to address multiple research questions then why not? That seems fine.

  • Andy says:

    It sounds like the primary focus of their paper is the new data, not the old, so I think they're OK. Reading between the lines, I might also speculate that a reviewer on a previous round said, "Hmm, this new study is addressing something relevant to what you published previously. . .the authors should really include a small figure with the old data for comparison."

  • AnyEdge says:

    Check the journal policy. I've seen journals explicitly state that manuscripts may, or may not, be based on previously published data. I'd either look at the instructions for authors, or just go straight to the editor and ask. (Most journals that explicitly say it's ok seem to be more McJournals...)

  • Yael says:

    Have seen this before in JBC/MCB.

    Also, some experiments (that are destined for different papers) are done at the same time, using the same sets of controls, like the IgG controls for ChIP--why not re-publish the IgG control as long as it's explicit that it was published before?

  • DrugMonkey says:

    Is it ethical to re-run the same positive control in a new group of rats just because you already published that control already? How should an IACUC focused on goals of Reduction view the request for another group in such a situation?

  • If the new paper answers a different question, but you're presenting the old data for comparison purposes, I think that's okay as it puts the information in context. Simply presenting old data without adding anything new however is something I think would be problematic.

  • dave says:

    yes (of course as long as its explicitly stated that its re-used). I also think its ok and even oftentimes good to publish a re-analysis of an existing dataset as long as you point out something new and interesting (for example the recent genome editing PLOS one paper)

  • AcademicLurker says:

    Seems fine to me.

    You did some measurements on one mutant (or knockout or whatever) and published them. A few years later you do the same measurements on a different mutant. If you want to highlight the differences between the two, it's easier to put both results in the same paper than to tell the reader to go back to your paper x years ago and compare for themselves.

    As long as you clearly indicate that the old data was published previously I don't see a problem.

  • MRW says:

    I'm an analytical chemist, and I mostly deal with detecting trace metals. I often include a table that's more than half previously published data - a typical table will have the detection limits for several elements by whatever the latest version of the instrument I've built is, detection limits for the same elements by the previous version of the instrument, and detection limits by a commonly used instrument. That makes the table 2/3rds previously published data, and usually 1/3rd my old data, but I find it easier to make the comparisons everything in one place rather than having the reader hunt down the original papers.

    It sounds like the situation you've encountered is similar. I think it basically depends on how important the figure is to the comparison. If the figure isn't really necessary to understand the comparison, then a reference to the previous paper is enough. If everyone who wants to understand the new results is going to have to dig up the old paper to compare the figures side-by-side, I think the figure should be included in the new paper.

    After all, review articles usually have old figures, collected together for convenience.

    Of course, the other consideration is that the previous journal may hold the copyright, so they need to give permission for the figure to be used.

  • drugmonkey says:

    MRW raises a key point for the Open nutts.

    Journals own what exactly? When they are justifying their "contribution" they focus on layout issues. That would imply try don't own the underlying data itself. So regraphing should be okay, right? :-). Okay, not if the exact same figure..but how about a single series on a multiseries graph?

    What about genetic models? The publisher has no ownership but the value to the world is closely tied to the data in the paper...

    How much is owned by the publisher?

  • Dr 27 says:

    In a way I kind of applaud their honesty .. but at the same time, I think the answer to your question may depend (somewhat) on the field. Take for instance structural biology (my field). I don't have any solid numbers, but for some (if not most) of the papers I've read that build upon an existing structure or a previously published one (or even a tiny bit of a structure, like say, a ligand bound to a part of a protein) it is quite common to bring that data back, possibly displayed in a different manner, if you're doing a direct comparison. Say an EM person publishes a high resolution structure of something that had been published by X-ray 10 years ago or whatever, I've seen people who do a side by side comparison of both structures and the details to highlight things that weren't visible or weren't solved previously. Now, I don't know if this is totes applicable to what you're referring to, but at least in struct bio it is somewhat common to revive data from the past to illustrate (or contrast) changes in a newly published paper.

  • Heavy says:

    No problem here since it was disclosed. Just reviewed a manuscript where they hid the fact and offered only minor new analyses with previously published data.....

  • Dr. O says:

    To elaborate on my tweet: I've seen many examples of this with large-scale (and often very expensive) experiments, and different aspects of the same experiment are construed into multiple stories. In these papers, the re-used data is only a stepping off point for the paper, with the majority of the manuscript focused on completely novel findings.

  • Dr. Cynicism says:

    Meh, I'm usually not a fan of double-dipping -- makes people look like they're hard up for ideas and work. But we all know labs that do it. There are some PIs out there that seem to have the exact same paper in 6 or 7 journals; it's crazy. But honestly, if there is a big enough addition/tweak or something, I could see re-using a sample for something small like comparison purposes? Oh what the hell do I know.

  • Travis says:

    Rather than a dogmatic rule about whether it's kosher to reuse data, doesn't it make more sense to ask whether this data aids the reader in interpreting the figure? If it does, then who cares if part of the data has been published elsewhere? (I'm assuming from your description that the manuscript is not largely based on recycling old data, but uses it simply for context in this figure)

    As some others have mentioned this may be a domain-specific thing - in my area (physiology/epidemiology) it's very common to produce multiple manuscripts from a given study looking at different outcomes (e.g. one manuscript looking at the effect of an intervention on food intake, another focusing on changes in energy expenditure). As a result, these manuscripts have a fair amount of overlap on all but the primary outcome variables (e.g. subject characteristics tables, controlled variables, are all likely to be similar across manuscripts). There is a bit of a grey zone between acceptable/unacceptable amounts of overlap (the epidemiology folks seem comfortable with a lot more overlap than the physiologists), but it simply wouldn't be feasible to cram all the data into a single manuscript in most journals.

  • D. C. Sessions says:

    Because the social sciences use HUGE data sets which cost a $BUNDLE to acquire (remember: IRBs!) it's pretty common to use the same data set to investigate multiple research questions. In extreme cases, consider the data mining that medical researchers do in Europe with national health records.

    Or, for that matter, the climate records used in global warming research.

  • ecologist says:

    Not only OK, but to be encouraged. If your data are completely used up in a single publication, you are collecting awfully simple data. If your new paper warrants comparison with a previous analysis, put it in! Why on earth not?

    As for who owns what, if you are republishing the same graphic, you can either get permission from the original copyright holder, or make use of the small print on every author copyright agreement I have seen recently that, in return for signing over your copyright, gives you permission to re-use the material in any other publication of yours. No problem.

  • FunkDoctorX says:

    It seems the large majority of folks think it's alright to republish old data for comparison purposes (although I don't quite know how you can properly do the stats). What I think has been lost thus far in the discussion is the value of replication (assuming this is not a large scale, expensive experiment). Being able to demonstrate an effect more than once provides a check on the validity of the initial result. If they want to compare, they should re-run and replicate the experiment. That way the stats can be done properly (using the same controls or whatever) and they can replicate a previous effect increasing the confidence that the previous effect is indeed legitimate (again, unless this is one of then expensive "omics" experiments).

  • Confounding says:

    Generally speaking, I don't have a problem with reusing data as long as its answering a different question *or* is answering the same question in an entirely new light (think reanalysis of an old data set using a new technique).

    This may be because I come from a field where a single large cohort study can potentially foster dozens of different research questions, and if it was not reused in some form would be staggeringly wasteful.

  • pyrope says:

    The question as a reviewer is simply whether the new data plus new interpretations are sufficiently interesting/distinct from the previous work to merit publication in the journal. If so, then there's no problem. I agree with previous comments that it is nice to see that level of transparency.

  • Anon says:

    If you did some sort of tissue optics measurement and then you overlaid the known absorption spectra of Hb and HbO2 on your data to compare, guess what? You ARE republishing data. Even worse, you're publishing somebody else's data! But nobody would fault you for it as long as you took the spectra from a reliable source and cited it properly. (Unless you were using a technique that had serious calibration issues so the most appropriate control was a spectrum acquired on your own instrument.) If you were an astronomer, and you overlaid a solar spectrum on your data to compare, nobody would fault you.

    Besides, think about the best way to communicate your point: If you've seen phenomenon X in specimen A, and now you think you're seeing the same phenomenon X in specimen B, one way or another you're going to have to say "In previous work we saw blah blah in the data. Here we see it as well." You can describe what you saw in words, and make the reader dig up the previous paper if they want to see what the data looked like previously. Or you can include the graphs/images/whatever for a side-by-side comparison and make the comparison very visual and immediately understandable. One of these things is much easier on the reader.

  • [...] Fumbling Towards Tenure, Dr. Becca has recorded a first year on the job and now wrestles with peer review, builds her first course and meets its students, and oversees her lab, all with a bemused [...]

Leave a Reply