Research fraud (plagiarism, data faking, etc)

jraven · March 19, 2024, 4:44pm

There’s also the issue that there’s not really an academic future in reproducing/validating others’ work. Very little funding or tenure opportunities for that kind of work.

meep · April 14, 2024, 7:59pm

Well, hardly surprising that someone into faking data may also have plagiarized publications:

https://www.science.org/content/article/embattled-harvard-honesty-professor-accused-plagiarism

Harvard University honesty researcher Francesca Gino, whose work has come under fire for suspected data falsification, may also have plagiarized passages in some of her high-profile publications.

A book chapter co-authored by Gino, who was found by a 2023 Harvard Business School (HBS) investigation to have committed research misconduct, contains numerous passages of text with striking similarities to 10 earlier sources. The sources include published papers and student theses, according to an analysis shared with Science by University of Montreal psychologist Erinn Acland.

Science has confirmed Acland’s findings and identified at least 15 additional passages of borrowed text in Gino’s two books, Rebel Talent: Why it Pays to Break the Rules at Work and in Life and Sidetracked: Why Our Decisions Get Derailed, and How We Can Stick to the Plan. Some passages duplicate text from news reports or blogs. Others contain phrasing identical to passages from academic literature. The extent of duplication varies between passages, but all contain multiple identical phrases, as well as clear paraphrases and significant structural similarity.
…
Debora Weber-Wulff, a plagiarism expert at the Berlin University of Applied Sciences, says Science’s findings are “quite serious” and warrant further investigation by the publishers and universities. HBS and Harvard Business Review Press, which published Sidetracked, declined to comment. Dey Street Books, a HarperCollins imprint that published Rebel Talent, and Guilford Press, publisher of the edited book The Social Psychology of Good and Evil that includes the co-authored chapter, did not respond to a request for comment.

Acland says she decided to “poke around” into Gino’s work in September 2023, after the researcher filed a $25 million lawsuit against HBS and the data sleuths who uncovered the misconduct. Acland focused on plagiarism, rather than data issues, because of her experience detecting it in student work. She searched phrases from Gino’s work on Google Scholar to see whether they matched content from other works.

She says she found apparent plagiarism in the very first sentence of the first work she assessed, the 2016 chapter “Dishonesty explained: What leads moral people to act immorally.” The sentence—“The accounting scandals and the collapse of billion-dollar companies at the beginning of the 21st century have forever changed the business landscape”—is word for word the same as a passage in a 2010 paper by the University of Washington management researcher Elizabeth Umphress and colleagues.

one example:

Both of Gino’s books and Gino and Ariely’s chapter contain an extensive list of references. However, for most of the examples of apparent plagiarism detected by Science, the original text was not cited.

The_Polymath · April 14, 2024, 8:08pm

That (in my view) happens a lot in academia.

People have deadlines so they don’t do the para-phrasing all that well + objectively quote the real source.

With the advent of AI and source checking, I fully expect more people to get “caught”.

meep · April 15, 2024, 7:49am

When it was back in 2000, and people had access to copy/paste, sure, it’s not surprising. Plagiarism was easy to do, and back then it wasn’t necessarily easy to check.

But by 2016, those of us who had to teach writing courses (more specifically, research writing courses), were required by universities to have our students submit their papers to plagiarism-checking software. (I can talk about how crappy these were another time)

So these profs knew they could get caught, by 2016, if anybody bothered to look. They just figured nobody was going to check.

meep · May 14, 2024, 5:58pm

https://www.wsj.com/science/academic-studies-research-paper-mills-journals-publishing-f5a3d4bc

Flood of Fake Science Forces Multiple Journal Closures

Wiley to shutter 19 more journals, some tainted by fraud

inquire within

Fake studies have flooded the publishers of top scientific journals, leading to thousands of retractions and millions of dollars in lost revenue. The biggest hit has come to Wiley, a 217-year-old publisher based in Hoboken, N.J., which Tuesday will announce that it is closing 19 journals, some of which were infected by large-scale research fraud.

In the past two years, Wiley has retracted more than 11,300 papers that appeared compromised, according to a spokesperson, and closed four journals. It isn’t alone: At least two other publishers have retracted hundreds of suspect papers each. Several others have pulled smaller clusters of bad papers.

Although this large-scale fraud represents a small percentage of submissions to journals, it threatens the legitimacy of the nearly $30 billion academic publishing industry and the credibility of science as a whole.

The discovery of nearly 900 fraudulent papers in 2022 at IOP Publishing, a physical sciences publisher, was a turning point for the nonprofit. “That really crystallized for us, everybody internally, everybody involved with the business,” said Kim Eggleton, head of peer review and research integrity at the publisher. “This is a real threat.”

The sources of the fake science are “paper mills”—businesses or individuals that, for a price, will list a scientist as an author of a wholly or partially fabricated paper. The mill then submits the work, generally avoiding the most prestigious journals in favor of publications such as one-off special editions that might not undergo as thorough a review and where they have a better chance of getting bogus work published.

World-over, scientists are under pressure to publish in peer-reviewed journals—sometimes to win grants, other times as conditions for promotions. Researchers say this motivates people to cheat the system. Many journals charge a fee to authors to publish in them.

Problematic papers typically appear in batches of up to hundreds or even thousands within a publisher or journal. A signature move is to submit the same paper to multiple journals at once to maximize the chance of getting in, according to an industry trade group now monitoring the problem. Publishers say some fraudsters have even posed as academics to secure spots as guest editors for special issues and organizers of conferences, and then control the papers that are published there.

“The paper mill will find the weakest link and then exploit it mercilessly until someone notices,” said Nick Wise, an engineer who has documented paper-mill advertisements on social media and posts examples regularly on X under the handle @author_for_sale.

The journal Science flagged the practice of buying authorship in 2013. The website Retraction Watch and independent researchers have since tracked paper mills through their advertisements and websites. Researchers say they have found them in multiple countries including Russia, Iran, Latvia, China and India**.** The mills solicit clients on social channels such as Telegram or Facebook, where they advertise the titles of studies they intend to submit, their fee and sometimes the journal they aim to infiltrate. Wise said he has seen costs ranging from as little as $50 to as much as $8,500.

When publishers become alert to the work, mills change their tactics.

“It’s like a virus mutating,” said Dorothy Bishop, a psychologist at the University of Oxford, one of a multitude of researchers who track fraudulent science and has spotted suspected milled papers.

For Wiley, which publishes more than 2,000 journals, the problem came to light two years ago, shortly after it paid nearly $300 million for Hindawi, a company founded in Egypt in 1997 that included about 250 journals. In 2022, a little more than a year after the purchase, scientists online noticed peculiarities in dozens of studies from journals in the Hindawi family.

Scientific papers typically include citations that acknowledge work that informed the research, but the suspect papers included lists of irrelevant references. Multiple papers included technical-sounding passages inserted midway through, what Bishop called an “AI gobbledygook sandwich.” Nearly identical contact emails in one cluster of studies were all registered to a university in China where few if any of the authors were based. It appeared that all came from the same source.

“The problem was much worse and much larger than anyone had realized,” said David Bimler, a retired psychology researcher in Wellington, New Zealand, who started a spreadsheet of suspect Hindawi studies, which grew to thousands of entries.

Within weeks, Wiley said its Hindawi portfolio had been deeply hit.

Over the next year, in 2023, 19 Hindawi journals were delisted from a key database, Web of Science, that researchers use to find and cite papers relevant to their work, eroding the standing of the journals, whose influence is measured by how frequently its papers are cited by others. (One was later relisted.)

Wiley said it would shut down four that had been “heavily compromised by paper mills,” and for months it paused publishing Hindawi special issues entirely as hundreds of papers were retracted. In December, Wiley interim President and Chief Executive Matthew Kissner warned investors of a $35 million to $40 million revenue drop for the 2024 fiscal year because of the problems with Hindawi.

According to Wiley, Tuesday’s closures are due to multiple factors, including a rebranding of the Hindawi journals and low submission rates to some titles. A company spokesperson acknowledged that some were affected by paper mills but declined to say how many. Eleven were among those that lost accreditation this past year on Web of Science.

“I don’t think that journal closures happen routinely,” said Jodi Schneider, who studies scientific literature and publishing at the University of Illinois Urbana-Champaign.

The extent of the paper mill problem has been exposed by members of the scientific community who on their own have collected patterns in faked papers to recognize this fraud at scale and developed tools to help surface the work.

One of those tools, the “Problematic Paper Screener,” run by Guillaume Cabanac, a computer-science researcher who studies scholarly publishing at the Université Toulouse III-Paul Sabatier in France, scans the breadth of the published literature, some 130 million papers, looking for a range of red flags including “tortured phrases.”

Cabanac and his colleagues realized that researchers who wanted to avoid plagiarism detectors had swapped out key scientific terms for synonyms from automatic text generators, leading to comically misfit phrases. “Breast cancer” became “bosom peril”; “fluid dynamics” became “gooey stream”; “artificial intelligence” became “counterfeit consciousness.” The tool is publicly available.

Another data scientist, Adam Day, built “The Papermill Alarm,” a tool that uses large language models to spot signs of trouble in an article’s metadata, such as multiple suspect papers citing each other or using similar templates and simply altering minor experimental details. Publishers can pay to use the tool.

With the scale of the paper-mill problem coming into ever better focus, it has forced publishers to adjust their operations.

IOP Publishing has expanded teams doing systematic checks on papers and invested in software to document and record peer review steps beyond their journals.

Wiley has expanded its team working to spot bad papers and announced its version of a paper-mill detector that scans for patterns such as tortured phrases. “It’s a top three issue for us today,” said Jay Flynn, executive vice president and general manager of research and learning, at Wiley.

Both Wiley and Springer Nature have beefed up their screening protocols for editors of special issues after seeing paper millers impersonate legitimate researchers to win such spots.

Springer Nature has rejected more than 8,000 papers from a suspected paper mill and is continuing to monitor its work, according to Chris Graf, the publisher’s research-integrity director.

The incursion of paper mills has also forced competing publishers to collaborate. A tool launched through STM, the trade group of publishers, now checks whether new submissions were submitted to multiple journals at once, according to Joris van Rossum, product director who leads the “STM Integrity Hub,” launched in part to beat back paper mills. Last fall, STM added Day’s “The Papermill Alarm” to its suite of tools.

While publishers are fighting back with technology, paper mills are using the same kind of tools to stay ahead.

“Generative AI has just handed them a winning lottery ticket,” Eggleton of IOP Publishing said. “They can do it really cheap, at scale, and the detection methods are not where we need them to be. I can only see that challenge increasing.”

fun paragraph:

Cabanac and his colleagues realized that researchers who wanted to avoid plagiarism detectors had swapped out key scientific terms for synonyms from automatic text generators, leading to comically misfit phrases. “Breast cancer” became “bosom peril”; “fluid dynamics” became “gooey stream”; “artificial intelligence” became “counterfeit consciousness.”

twig93 · May 14, 2024, 6:09pm

Good grief.

meep · May 14, 2024, 6:17pm

More on this saga

Less than three weeks after our report, Elsevier told us it would pull the study, “Green innovations and patents in OECD countries,” which appeared last year in the Journal of Cleaner Production. On May 4, the publisher issued a retraction notice stating:

Concerns were raised about the data in this paper post-publication. The first author confirmed to the Editors that the data contained many gaps across countries and over time. The majority of the missing unit values appeared in beginning and end year years (1990–1992 and 2017–2018). To represent country data over time, the first author imputed the missing unit values using forward and backward trends based on three consecutive values in Excel. The first author claims that this resulted in a balanced dataset inclusive of the sample countries and further states that the imputation of missing values based on variable—specific trend will not change the result and large number of observations will produce more stable estimated coefficients.

However, the first author agrees that he has not explained the imputation in the data section of the article and the effects that the imputed data has on the result has not been tested. Since a total of 36 variables had missing units, few observations would have remained.

The Editors and the authors have concluded that the findings of the paper may be biased.

…

Heshmati has used the same dataset in past research. As we reported in February:

In 2020, he and two colleagues published a paper in Empirical Economics, a Springer Nature title, that bore strong resemblance to the 2023 article and relied on the same patched-up dataset. The article mentioned neither the data gaps nor the Excel operations.

The publisher told us at the time it was “looking into the matter carefully.” In a May 9 email, however, Deborah Kendall-Cheeseman, a communications manager at Springer Nature, told us:

I’m afraid we don’t have an update we can share at the moment but we’ll let you know once we do.

meep · May 14, 2024, 6:19pm

Retraction Watch keeps a list of published journal papers they suspect with ChatGPT use

without necessary editing

Mathman · May 14, 2024, 6:50pm

Priceless.

meep · June 21, 2024, 9:38am

Delving into ChatGPT usage in academic writing through excess vocabulary

Dmitry Kobak, Rita González Márquez, Emőke-Ágnes Horvát, Jan Lause

Recent large language models (LLMs) can generate and revise text with human-level performance, and have been widely commercialized in systems like ChatGPT. These models come with clear limitations: they can produce inaccurate information, reinforce existing biases, and be easily misused. Yet, many scientists have been using them to assist their scholarly writing. How wide-spread is LLM usage in the academic literature currently? To answer this question, we use an unbiased, large-scale approach, free from any assumptions on academic LLM usage. We study vocabulary changes in 14 million PubMed abstracts from 2010-2024, and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. Our analysis based on excess words usage suggests that at least 10% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, and was as high as 30% for some PubMed sub-corpora. We show that the appearance of LLM-based writing assistants has had an unprecedented impact in the scientific literature, surpassing the effect of major world events such as the Covid pandemic.

meep · June 22, 2024, 1:22pm

Money is to be made!

Scientists—like the rest of us—are not immune to errors. “It’s clear that errors are everywhere, and a small portion of these errors will change the conclusions of papers,” says Malte Elson, a professor at the University of Bern in Switzerland who studies, among other things, research methods. The issue is that there aren’t many people who are looking for these errors. Reinhart and Rogoff’s mistakes were only discovered in 2013 by an economics student whose professors had asked his class to try to replicate the findings in prominent economics papers.

With his fellow meta-science researchers Ruben Arslan and Ian Hussey, Elson has set up a way to systematically find errors in scientific research. The project—called ERROR—is modeled on bug bounties in the software industry, where hackers are rewarded for finding errors in code. In Elson’s project, researchers are paid to trawl papers for possible errors and awarded bonuses for every verified mistake they discover.

The idea came from a discussion between Elson and Arslan, who encourages scientists to find errors in his own work by offering to buy them a beer if they identify a typo (capped at three per paper) and €400 ($430) for an error that changes the paper’s main conclusion. “We were both aware of papers in our respective fields that were totally flawed because of provable errors, but it was extremely difficult to correct the record,” says Elson. All these public errors could pose a big problem, Elson reasoned. If a PhD researcher spent her degree pursuing a result that turned out to be an error, that could amount to tens of thousands of wasted dollars.

Error-checking isn’t a standard part of publishing scientific papers, says Hussey, a meta-science researcher at Elson’s lab in Bern. When a paper is accepted by a scientific journal—such as Nature or Science–it is sent to a few experts in the field who offer their opinions on whether the paper is high-quality, logically sound, and makes a valuable contribution to the field. These peer-reviewers, however, typically don’t check for errors and in most cases won’t have access to the raw data or code that they’d need to root out mistakes.

The end result is that published science is littered with all kinds of very human errors—like copying the wrong value into a form, failing to squash a coding bug, or missing rows in a spreadsheet. The ERROR project pairs authors of influential scientific papers with reviewers who go through their work looking for errors. Reviewers get paid up to 1,000 Swiss francs ($1,131) to review a paper and earn bonuses for identifying minor, moderate, and major errors. The original authors are also paid for submitting their paper. ERROR has 250,000 Swiss francs from the University of Bern to pay out over four years, which should be enough for about 100 papers.

Jan Wessel, a cognitive neuroscientist at the University of Iowa, was the first scientist to have his work checked by ERROR. Elson already knew Wessel, and asked the researcher whether he’d like to take part in the project. Wessel agreed, on the proviso that he submitted a paper where he was the lone author. If they found a major error, Wessel wanted it to be clear that it was his mistake alone, and not risk jeopardizing the career of a colleague or former student.

the project:

meep · June 22, 2024, 1:25pm