The Tadpole Paper Mill

Tadpoles. Modified image by MarjanNo from Pixabay.

Most posts on this website are about duplications within or between figures in the same paper, or about duplications found between papers by the same group of authors. But now, our small group of image forensics detectives has come across a large set of papers – over 400 as of today – from different authors and affiliations that all appear to have been generated by the same source. Based on the resemblance of the Western blot bands to tadpoles (the larval stage of an amphibian, such as a frog or a toad), we will call this the Tadpole Paper Mill.

Update: for those who cannot access the Google Sheets link above, here is a PDF version embedded in WordPress which I hope will work for all (generated March 3, 2020):

What is a paper mill?

A paper mill is a shady company that produces scientific papers at demand. They sell these papers to e.g. medical doctors in China who need to have a scientific paper published in an international journal in order to get their MD, but who do not have any time in their educational program to actually do research. Authorships on ready-to-submit or already-accepted papers are sold to medical students for hefty amounts, as described e.g. by Mara Hvistendahl in ‘China’s Publication Bazaar‘ in Science, 2013. Whether or not the experiments described in these papers has actually been performed is not clear. Some of these paper mills might have laboratories producing actual images or results, but such images might be sold to multiple authors to represent different experiments. Thus, the data included in these papers is often falsified or fabricated.

Jana Christopher, an Image Data Integrity Analyst, described such a paper mill in her 2018 paper ‘Systematic fabrication of scientific images revealed‘ in FEBS Letters.

In her paper, Christopher describes a set of 12 manuscripts that were submitted to different FEBS journals, which all contained Western blot with very regularly spaced and particularly shaped bands, unusual absence of dots, smears, or stains, and identical backgrounds. These manuscripts however, came from different research groups at different institutions.

Figure 2 from Christopher, FEBS Letters 592 (2018) 3027–3029, DOI: 10.1002/1873-3468.13201. Note similarities in band shapes and background patterns. The left panels and right panels were from different manuscripts with different authors and affiliations.

Jennifer A Byrne, a professor at the University of Sydney has also raised concerns about such paper mill-produced fraudulent studies, in particular those describing gene knockdown studies. In her 2019 paper ‘The Possibility of Systematic Research Fraud Targeting Under-Studied Human Genes: Causes, Consequences, and Potential Solutions, she describes a cohort of gene knockdown papers with a very similar figure layout and order.

Now, Byrne and Christopher teamed up to publish another paper about paper mill publications. Their paper ‘Digital magic, or the dark arts of the 21st century—how can journals and peer reviewers detect manuscripts and publications from paper mills?‘ was published this week in FEBS Letters.

The Tadpole Paper Mill

As it turns out, Byrne and Christopher’s publication describes the exact same set of papers that our small team of image forensics detectives had been working on in the past month. While they show results from 17 unrelated manuscripts, our team, consisting of Smut Clyde, Morty, Tiger, and me has found about 400 papers so far. You can find the (still growing) list of papers here.

As described by Smut Clyde on Leonid Schneider’s blog For Better Science in a blog post called ‘The full-service paper mill and its Chinese customers‘ the background similarities between Western blot panels were first noticed by pseudonymous PubPeer user Indigofera Tanganyikensis in two PubPeer posts here and here.

Not only was the Western blot background highly similar between panels within the same figure, it was also unexpectedly similar between panels from different papers.

As previously described by Christopher (see above), the Western blot bands in all 400+ papers are all very regularly spaced and have a smooth appearance in the shape of a dumbbell or tadpole, without any of the usual smudges or stains. All bands are placed on similar looking backgrounds, suggesting they were copy/pasted from other sources, or computer generated.

Illustration made by Hoya Camphorifolia that highlights the identical background patterns in the Western blot figures of the Tadpole Paper Mill papers. Source: https://pubpeer.com/publications/F6714DDB461DD528080734B32A0AC9

Death Star Flow Cytometry plots

In addition to the similarities between the Western blot in all 400+ papers, there are other similarities too. Some of the papers contain unusually shaped flow cytometry plots that Smut Clyde beautifully described as follows (it refers to a Star Wars weapon).

The Death Star down in each frame’s quadrant Q3 is shooting out bolts of planet-smashing energy to the right through Quadrant Q4, while disciplined flotillas of Rebel Alliance X-Wing starfighters are swooping down through Q1 and Q2.

Smut Clyde – ‘The full-service paper mill and its Chinese customers
Unusually shaped #FlawCytometry panels in one of the Tadpole Tales. Source: https://pubpeer.com/publications/B5D48700144142EC185E1800F86FD8

Layout look-alikes

The bar graphs in most of these papers also look very similar. They usually are in solid shades of grey, with the black bar on the left, and with double edged error bars. As with the titles, such layout similarities in figures are not necessarily a problem, and this might be e.g. the standard format for a popular graphing program. However, when the same layout is found in 100s of papers from different institutes, it is a red flag.

Figures from four different papers by different research groups that all share a similar layout. Note the similarities between bar graph layout, fonts, and Western blots.

Publication title reveals paper mill suspects by using a similar structure

Most of the papers share a similar title structure. Note that this is not necessarily a problem by itself, in particular for papers written by authors for whom English is not a first language. However, in combination with the Western blot background similarities, and greyscale bargraphs, the title structure is one of the signs that a paper belongs to this Tadpole paper mill set. Examples of paper titles that follow this structure are:

Here is an illustration of how the titles might be generated.

Tadpole paper mill title generator

Affected institutes and journals

If you take a look at the list of 400+ papers you can see some patterns. Note that the publicly available list is always a bit behind on the list we are actively working on, so the number of papers might be a bit lower. We will try to update regularly.

The majority of the Tadpole Tales have been published by the following six journals with multiple publishing houses involved:

The earliest papers from the Tadpole paper mill are from 2016, although the bulk of the papers have appeared in 2018-2020.

Many of the papers appear to have been authors affiliated at hospitals or medical schools, which is not surprising since having at least one publication is part of the requirements to become an MD in medical schools in China.

  • Jining No. 1 People’s Hospital: 101 papers
  • China‐Japan Union Hospital of Jilin University: 59 papers
  • The Affiliated Hospital of Qingdao University: 23 papers
  • Linyi Central Hospital: 16 papers
  • The First Affiliated Hospital of Zhengzhou University: 16 papers
  • Affiliated Hospital of Jining Medical University: 12 papers

Although more than 100 papers were authored by doctors from Jining No. 1 People’s Hospital in the Shandong province, there is little overlap between the authors. The papers were affiliated with a range of different departments, from Pediatrics (15 papers), Cardiology (6), Endocrinology (6), Nephrology (6), and Vascular Surgery (5).

Raising the alarm

This large set of presumably falsified papers is cause for great concern. Our small team of volunteers has found 400 papers, based on similarities in titles, keywords and layout. These were relatively easy to find, although we did put in many hours of unpaid work to find these papers. One can only guess how many other falsified publications have been produced by other paper mills using less recognizable and more variable layouts. The number of such paper mill articles might be in the thousands.

It is also sad to see that all these papers have been published after peer-review with apparently little editorial quality control.

As suggested by both Christopher and Byrne in the papers mentioned above, critical screening of images and other data should be an essential part of the pre-publication process of scientific papers. Scientists pay money to either publish or read these papers (depending on whether they are paywalled or open access), and they might spend months or years trying to replicate these results. Publishers should therefore do a much better job to screen papers before they accept papers for publication, and they should not just rely on unpaid and untrained peer reviewers or post-publication volunteer work to detect fabricated images.

In addition, it is of great concern to see that this specific paper mill has successfully “infected” particular journals. An anonymous source has even suggested that some of these journals might be actively working together with such paper mills, and promote the generation of fake papers and selling of such publications to the large army of aspiring MDs desperate to finish their degrees. Although this rumor has not yet been confirmed, it is very alarming to see that journal editors do not appear to have noticed the similarities between dozens of papers published in their journals.

As a scientific community, we need to work harder in detecting such fraudulent papers. Finding these fabricated images should not rely solely on the work of unpaid volunteers.

29 thoughts on “The Tadpole Paper Mill”

  1. Pingback: New top story on Hacker News: The Tadpole Paper Mill – The Pakistani News Corner
  2. I wondered how long it would take me to find one of these. About five minutes. I picked the journal from your list of top journals that had the fewest articles, looked at the free sample issue, and voila:

    Effects of RNA interference‐mediated silencing of toll‐like receptor 4 gene on proliferation and apoptosis of human breast cancer MCF‐7 and MDA‐MB‐231 cells: An in vitro study
    Xiao‐Ling Gao, Jiao‐Jiao Yang, Shu‐Juan Wang, Yan Chen, Bei Wang, Er‐Jing Cheng, Jian‐Nan Gong, Yan‐Ting Dong, Dai Liu, Xiang‐Li Wang, Ya‐Qiong Huang, Dong‐Dong An
    Journal of Cellular Physiology 234 (2018)
    https://doi.org/10.1002/jcp.26573

    Like

      1. It could be that this is an example of using the same software, but not being part of the paper mill. Could be a useful control if so. I will try to look at it in more detail tonight.

        Like

  3. I show great respect for your team’s effort, as a medical student from China, I confront a dilemma which is to choose a tutor. What a coincidence that a tutor I favour has published a article in a journal you have been mentioned above, which titled -MicroRNA-10b controls the metastasis and proliferation of colorectal cancer cells by regulating Krüppel-like factor 4. Could you help me confirm whether this article is like what you have picked out. Thank you!

    Like

    1. I have just added a link to a PDF version of the list in the beginning of this post, so that people without access to Google Sheets can download it. Let me know if that works for you.

      Like

  4. One should check whether the published results are reproducible or faked. I’ve noticed that it’s more often non-reproducible

    Like

  5. These funny-business papers routinely use siRNA knockdown, and many use the same company to produce their lentiviral constructs. When you perform a Google Scholar search for the company, all you find are funny-business papers. It makes one wonder if these are the companies providing the “data”. Shanghai Hollybio, Wuhan Genesil Biotechnology, Shanghai Gene Chem, etc

    Like

  6. While looking at the above images I notice many more copy-&-paste iterations within the background; though, it only takes a single C&P to equate to falsified data. Excellent work, and surely much appreciated by those in the scientific community who appreciate honesty and integrity. Shame on those guilty of this reprehensive practice. Conceivably, the publishers are taking a step back to determine how they can better prescreen submissions to mitigate future embarrassments and safeguard their reputations.

    Like

  7. Hi! Thank you for all the hard work you have put into uncovering this paper mill!

    I have a question on further tips to identify fake papers. I found the following which contains mainly graphs and just one small western blot image. It looks a bit suspicious due to the journal (J Cell Biochem), the structure of the title and the use of a lentiviral vector, but the lack of images makes it hard to confirm it. Do you have any tips? Could the presence of multiple first authors also be a red flag, since maybe multiple aspiring doctors would obtain their required publication in just one paper?

    Zhang, X, Chen, Q, Shen, J, Wang, L, Cai, Y, Zhu, K‐R. miR‐194 relieve neuropathic pain and prevent neuroinflammation via targeting FOXA1. J Cell Biochem. 2020; 1– 8. https://doi.org/10.1002/jcb.29598

    Thank you!

    Like

    1. Hi Martina, This paper does not contain the specific Western blot pattern that is characteristic for the Tadpole Papermill, nor does it have any images that would be suspicious for belonging to another papermill. The title structure and multiple first authors are possible red flags, but there are many papers with those features, but without any image problems we cannot be suspicious.

      Like

  8. So as a librarian in an institution that subscribes to Elsevier, Wiley, and Taylor & Francis ejournal packages, aren’t these articles going through peer review? If so, why aren’t they caught in the peer review process? Why aren’t these major publishers catching this before publication?

    Liked by 1 person

    1. They have been through peer review, but it is hard to see the similarities if you only handle one paper at the time. However, the similarities in background between multiple panels in the same paper should have been a huge red flag. Publishers are not screening (enough) for these images. It takes only seconds to find them, so let’s hope they will invest some extra time to screen them going forward.

      Liked by 1 person

  9. To be fair I don’t think that layout similarities in the plots are necessarily a convincing sign of dubious origin. The ones above all seem to have been created in a commercial program called GraphPad Prism which just looks like this in its default configuration.
    Equally, if you you e.g. use the popular ggplot2 package for the R statistical environment with its default theme , you are going to end up with plots with a distinct visual appearance and color palette that is immediately recognizable across papers but does say absolutely nothing about the source or any possible duplication of data.

    Like

  10. I have reviewed or read papers written by Chinese speakers (in a different domain). Inevitably their English is quite bad (although better than my non-existent Chinese). I could imagine that there might be translators who specialize in translating biology (etc.) papers into English, and that these putative human translators might use Computer Assisted Translation (CAT) systems. These are similar to Machine Translation (MT) systems in that they use statistical matching methods, but differ in that they typically store texts as pairs of sentences in the original language + translations by their human users, rather than being fed large bilingual corpora. They chop the original text into sentences, and offer you one or more translations based on the similarity of those sentences to sentences they have stored. While I have not seen studies, it’s possible that this would lead to similarities in sentence structure among articles translated by the same translator (or a team of translators who share the stored translations), like the ones you observe (especially in titles).

    Like

Leave a comment