Large language models (LLMs) have revolutionized the way we generate written content, but their impact on scientific writing has been questioned. A recent study conducted by researchers from Germany’s University of Tübingen and Northwestern University has shed light on the prevalence of LLM usage in scientific abstracts published between 2023 and 2024. By analyzing the frequency of specific “excess words” in abstracts, the researchers were able to estimate that at least 10 percent of papers from 2024 were processed with LLMs.
To conduct their study, the researchers analyzed 14 million paper abstracts published on PubMed between 2010 and 2024. They tracked the relative frequency of specific words and compared the expected frequency based on pre-2023 trends to the actual frequency in abstracts from 2023 and 2024, when LLMs were widely used. The results revealed a significant increase in the usage of certain style words that were previously uncommon in scientific writing.
Words like “delves,” “showcasing,” and “underscores” saw a surge in popularity in post-LLM abstracts, with some words appearing up to 25 times more frequently in 2024 papers compared to pre-LLM trends. Additionally, common words like “potential,” “findings,” and “crucial” experienced notable increases in scientific usage post-LLM era. The researchers noted that these changes were unprecedented in both quality and quantity, signaling a shift in vocabulary choice in scientific writing.
While language naturally evolves over time, the researchers highlighted that the sudden and massive increases in word usage post-LLM era were distinct from those related to major world health events. Unlike the rise of words like “ebola” and “zika” during previous health crises, the excess words identified in post-LLM abstracts were primarily style words such as verbs, adjectives, and adverbs. This shift in linguistic patterns suggests a direct influence of LLMs on scientific writing practices.
The study’s findings have significant implications for the scientific community. By identifying marker words that are indicative of LLM usage, researchers can better understand the extent to which these tools are shaping written content. The researchers estimated that at least 10 percent of papers in the post-2022 period were written with LLM assistance, although the actual number could be higher. This raises questions about the potential impact of LLMs on the credibility and originality of scientific research.
The study highlights the growing influence of large language models on scientific writing practices. The increased prevalence of style words in post-LLM abstracts indicates a shift towards more sophisticated and nuanced language use. As LLMs continue to advance, it is essential for researchers to critically assess the implications of their usage on the quality and integrity of scientific literature. The findings of this study serve as a reminder of the need for vigilance and transparency in the era of AI-driven content generation.
Leave a Reply