Discontinuous ridiculous stools – a preprint full of tortured phrases and stolen data

Patients with provocative entrail illness unclassified gave to crisis division a 3-day history of sickness, retching, migraine and irregular stomach torment alongside discontinuous ridiculous stools as of late.

If you cannot wrap your brain around this sentence, don’t worry. Neither can I.

A photo of a very ridiculous stool: a poop-emoji cake, with big white googly eyes and twisted candles on top. Taken at uBiome headquarters, March 2017.

Tortured phrases

The wording is full of tortured phrases, a specific way of rewording text used by authors who want to disguise plagiarized text. To avoid detection by plagiarism detection tools, they run the copied text through ‘synonymizer’ software to find alternative words. It can result in nonsensical or even funny-sounding phrases.

Some common tortured phrases include:

  • “Counterfeit consciousness” instead of “artificial intelligence”
  • “Profound neural organization” instead of “deep neural network”
  • “Colossal information” instead of “big data”
  • “Bosom peril” instead of “breast cancer”
  • “Haze figuring” instead of “cloud computing”

Such tortured phrases were first described by Guillaume Cabanac, Cyril Labbé, and Alexander Magazinov, in their 2021 preprint Tortured phrases: A dubious writing style emerging in science. Evidence of critical issues affecting established journals. Their Problematic Paper Screener database currently has over 21,000 papers containing five or more of those phrases. Most were published in the early 2020s, with their incidence declining a bit after 2022 – presumably because ChatGPT and other generative Artificial Intelligence tools can do a much better job rewriting, and thus hiding, plagiarized text.

Graph from the Problematic Paper Screener, showing the number of papers containing at least 5 known tortured phrases, plotted per year.

A preprint with lots of strange synonymized phrases

Doing a search for the tortured phrase “provocative gut illnesses” (Inflammatory Bowel Disease; IBD) in Google Scholar, I found this gem in the medRxiv preprints collection. It is called Significance of headache in inflammatory bowel diseases, by Baqir Ali Khalid et al., DOI: 10.1101/2023.02.05.23285412, first uploaded in February 2023. The seven authors are affiliated with five universities and medical colleges in Pakistan.

Screenshot of the title and authors from the medRxiv page.

In the article, the authors claim to have collected ‘data’ from 20 IBD patients who presented at the emergency department with headache and bloody stools.

The text is very hard to understand, with some over-synonymized sentences. See if you can figure out what the authors mean by:

  • “Cerebral vein apoplexy can be deadly and finding is trying as side effects are vague”
  • “We might want to urge clinicians to continually reexamine their choices, particularly in the event that there is nonappearance of clinical improvement after a generally deep rooted treatment”
  • “EIMs address the primary driver of horribleness in Compact disc.” (Perhaps the authors mean Crohn’s Disease, which is often abbreviated as “CD”.)
  • “In the last option study, the chances proportion was 2.66 (95% certainty stretch = 1.08-6.54) contrasted with everybody” (Chances proportion = odds ratio; certainty stretch = confidence interval)
  • “As neuropathic torment seriously influences personal satisfaction and legitimizes explicit medicines, it appears to be vital to be aware on the off chance that some IBD patients ought to profit from such medicines”

Turning a single case report into a 20-patient study

In the preprint, the authors describe 20 patients with “provocative entrail illness” (read: inflammatory bowel disease) who are described in detail. They all had a 3-day sickness, they all had “irregular stomach torment alongside discontinuous ridiculous stools as of late”, and they had all been in contact with “youngsters” two days earlier. Hmmm, all 20 of them?

Then their blood test results are not given as average values, but as very specific measurements: “Blood tests upon the arrival of affirmation showed a C-responding protein of 86 mg/L, a typical white platelet count and lack of iron frailty (hemoglobin 110 g/L). Other lab discoveries like liver and pancreas chemicals, creatinine, urea and electrolytes were ordinary. Egg whites was low (26 g/L) as it was during earlier weeks.

This sounded more like a single blood test, rather than the result from a set of 20 patients.

With some sleuthing, it wasn’t hard to find the original text. Remember, text with tortured phrases is plagiarized text, so translating the text back into regular biomedical expressions could lead you to the source paper.

Here, the source paper was a case report about a 15-year old IBD patient, published by Orfei et al. in BMJ Case Reports, in 2019, DOI: 10.1136/bcr-2018-227228. The authors of the 2023 medRxiv preprint appear to have taken the 2019 case report to make it sound like a set of 20 patients.

Here is a side-by-side comparison of the 2019 BMJ Case Reports paper (left) and the 2023 medRxiv preprint (right). I have color-coded some sentences to help with the navigation. Can you spot the stray “her” that the preprint authors left in by accident?

Left:
Text from Martina Orfei et al., BMJ Case Rep (2019)
DOI: 10.1136/bcr-2018-227228. Right: text from
Baqir Ali Khalid et al., medRxiv Preprint (2023)
DOI: 10.1101/2023.02.05.23285412

Copied data from National Health Interview Survey

The preprint continues with “data” collected from “20 patients”. Tables 1 and 2 list patient characteristics such as smoking status, body mass index, and migraine occurrence. Interestingly, the text reads “The overall age-adjusted prevalence of migraine or severe headache was 15.4% (n = 9,062)
and of IBD was 1.2% (n = 862).
” With only 20 patients in the study, those are unexpected n’s.

Using some of the numbers in Table 1, I could easily find the source of the data. All values were identical to those in Yong Liu et al., Headache (2021), DOI: 10.1111/head.14087, a study carried out on 60,436 US adults who participated in the 2015 and 2016 National Health Interview Survey. That’s a lot more than 20 patients!

Here’s a side-by-side comparison of the NHIS 2021 data and the 2023 preprint. The values are all identical.

Comparison of data found in Yong Liu et al., Headache (2021), DOI: 10.1111/head.14087 (left) and Baqir Ali Khalid, medRxiv (2023), DOI: 10.1101/2023.02.05.23285412 (right).

There are probably other scientific papers that were used to generate this preprint, but the evidence seems clear. Patient data was copied from two older sources, then the copied text was synonymized to avoid plagiarism.

You can find my analysis here on PubPeer. I also left a comment on the medRxiv website, and also notified the preprint server organizers of this tortured phrases pearl.

3 thoughts on “Discontinuous ridiculous stools – a preprint full of tortured phrases and stolen data”

Leave a reply to fagoldberg Cancel reply