What is Research Misconduct? Part 1: Plagiarism

This is Part 1 of a series of 3, which also includes Part 2: Falsification, and Part 3: Fabrication.

The Office of Research Integrity (ORI), part of the USA Department of Health and Human Services, defines Research Misconduct on their website:

Let’s clarify that a bit more with some examples. In this blog post, I will discuss plagiarism.

Scientific papers should be original and unique – that is what moves science forward. Plagiarism is defined as re-using someone else’s words, ideas, or results. Although this might sound clear at first glance, there is a huge grey zone. It is fairly easy to scan for textual similarities (like a BLAST search with a DNA sequence!), but it is much harder to prove that ideas have been reused.

Textual similarities

Text similarities in scientific papers are fairly easy to detect. Just take a couple of words between quotes and search the web or Google Scholar for these text strings.

Most scientific journals will scan the incoming manuscripts for textual similarities with special software. The most widely used software is iThenticate, which can scan texts against a proprietary database of already published scientific papers, books, and websites. This is expensive software that I would love to be able to afford. Unfortunately, the company does not give free licenses to science integrity detectives like me.

On my wishlist: A free/affordable plagiarism detection tool based on Google Scholar.

Some short text similarities are OK

A little bit of textual plagiarism is acceptable – as long as it is limited to a couple of sentences, not a whole paragraph. By definition, a definition will be identical to text in many older papers, and this is totally acceptable (also see the example above). Here are some examples of definitions that will not be considered plagiarism. Obviously, it is nice to add a reference to such statements if possible.

“Probiotics are live microorganisms, which when administered in adequate amounts confer a health. benefit on the host” (>750 occurrences in Google Scholar, source: FAO/WHO definition).

“Apoptosis is programmed cell death” (>1000 occurrences; text book definition)

“The human microbiome refers to the community of microorganisms”

“Ecology is the branch of biology that deals with the relations of organisms to one another and to their physical surroundings.”

Some textual similarities in the Methods are also totally fine, again, as it is limited to small blocks of text, not the complete Methods section. There are only so many ways you can describe that you ran DNA on a 1.5% agarose gel with ethidium bromide, housed your mice in cages and gave them ad libitum water, or that you analyzed your 16S rRNA amplicon sequences with mothur or QIIME.

Some “stolen” sentences in the Introduction of a research paper, as long as the data is novel, is obviously less worrisome than copy/pasting the complete introduction or the data from someone else’s papers, as shown below.

Large text similarities are not OK

Stealing someone else’s text is considered science misconduct, not flattery, in particular when we are talking about large chunks of text.

Here are some examples of textual similarities that are not acceptable. This is a screenshot from a (now retracted) review paper written by a single author, who also happened to be the Associate Editor for that journal. The colors show text that was identical to text found in other papers, with a different color for each source paper.

The screenshot below is another example of a paper retracted for plagiarism, this time because Introduction text of this research paper was largely similar to text written by others.

How to properly re-use someone else’s text

If you are going to use a sentence or a large block written by another researcher, you can do that if you use quotation marks and a clear attribution (a citation to the source). For example:

A 2014 paper found that “although human-associated microbial communities are generally stable, they can be quickly and profoundly altered by common human actions and experiences” (David LA et al., Genome Biology 2014, 15:R89).

Text recycling

How about the re-use of text you have written yourself, “text recycling”? This is sometimes called “self-plagiarism”, although this is not plagiarism according to the ORI definition, because it does not involve re-using some one else’s text.

Text recycling is one of those grey zones where it is hard to draw a clear line. Is it OK to submit the same essay, that you have written yourself, to 2 different teachers?

Publishing the exact same paper twice is not allowed by most journals, because science papers need to be original and not submitted somewhere else. On the other hand, a couple of lines in the introduction taken from a previous paper by the same author is acceptable, again, as long as it is not a complete paragraph.

The screenshot below, shows the complete text of a review paper. The bright yellow color is recycled text from an older review by the same authors, while all the other colors indicate text taken from text written by different authors. Unfortunately, the journal Drugs did not seem to find this a problem at all, and the paper has not been corrected or retracted. Maybe this has to do with the 300 citations that his paper already has received. Would you agree with this decision?

Data or figure plagiarism

Another form of science misconduct that falls under plagiarism is copying data or figures from a paper written by someone else. These types of overlap are much harder to catch. There is no good software on the market yet that can scan manuscripts for figure reuse – although people are working hard to make such tools.

Here is an example of a retraction of a scientific paper that used figures from an older paper authored by a different research group: “Author loses five recent papers for copying multiple figures, unspecified “overlap”” (From RetractionWatch).

Update: Here is a blog post from Paul Brookes confirming that the ORI does not consider reuse of images by the same authors to be misconduct, as long as the images describe the same experiments.

Below is a screenshot from another example that was recently discussed online (see e.g. this @SmutClyde tweet). Here, it appears that a 2015 paper on prostate cancer (shown on the left) might have re-used data from a 2014 paper on liver cancer (shown on the right). Note that the numbers are identical, and that the left paper reported women with prostate cancer. This case is presumed to be under investigation, and might be part of a bigger set of papers that all appeared to have copied each other’s data. See also part 3 of this blog post series about data fabrication and possible overlap with data plagiarism.

Striking similarities between data presented in a 2015 paper on prostate cancer (left) and a 2014 paper on liver cancer (right).

Other forms of plagiarism

Plagiarism that does not involve exact textual or number similarities is even harder to define, in particular when it is about copying someone’s ideas.

For an unexperienced scientist, is it OK to write a paper using another paper as a “scaffold” and plug in your own figures and data? Maybe they did not have a lot of experience, and just needed some help.

Is it OK to write about a cool new hypothesis and not cite work from other people who recently published the exact same idea? You could argue that the person might not have seen that other paper.

One form of plagiarism that is not OK is for peer-reviewers to steal ideas from a manuscript they are reviewing. If you do a peer-review of a paper, you accept the agreement that you cannot use the ideas of the paper for your own benefit. There have unfortunately been some cases (Retraction Watch, December 2016, May 2017, February 2019) where a researcher’s manuscript got rejected, but was published later – by a different group.

Further reading

Here is some further reading on this topic (will update if I come across more):

6 thoughts on “What is Research Misconduct? Part 1: Plagiarism”

  1. Dr. Bik: as you will recall, not all authors share your views on self-plagiarism,

    “Nothing shortcoming about my copyrights being used by myself. You’re just unhappy about what is being published. You are a KETO stalker trying to bully and harass your way. You undoubtedly will never be part of the bigger most important work of trying to improve Medicine or advancing medical knowledge. Name the National TV program when and where you would like to discuss tete-a-tete.”


  2. When (if) you run out of easy material consider posting about some of your preferred tools for identifying problems in papers. You e been kind enough to share some ideas with me and they’ve been helpful.

    Liked by 1 person

  3. There have unfortunately been some cases (Retraction Watch, December 2016, May 2017, February 2019) where a researcher’s manuscript got rejected, but was published later – by a different group.

    I suspect that there are more cases yet to emerge.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: