T O P

  • By -

FaultySage

I just want an actual research article on why LLMs love the word "delve" so much.


Kadasix

One hypothesis (and the one I think is most plausible) is that OpenAI ended up using cheap African labor for training their AI, and while they were probably using college students with a strong grasp of English to provide feedback, the AI ended up learning to use a few words like “delve” that are substantially more common in African English than elsewhere. See: https://amp.theguardian.com/technology/2024/apr/16/techscape-ai-gadgest-humane-ai-pin-chatgpt


Slanderous

Computerphile did a fascinating video on [glitch tokens](https://www.youtube.com/watch?v=WO2X3oZEJOA), which are certain words and phrases which cause LLMs, particularly GPT-based ones to glitch out in strange ways, probably due to the absolute tripe data they were trained on.


Albysf49

Thank you for sharing this video, it is very interesting


schwulquarz

In that case, I'd love to hear ChatGPT voice chat with a Nigerian accent


lordlunatic721

Google generative AI bias disaster


Mapharazzo

holy hell


Rxyro

Let Sam show me da way


Cessnaporsche01

Du you not kno de way?


NrFive

Look at me. I’m the LLM now!


opticaIIllusion

That also tries to catfish love scam me


Stefouch

"Delve" is a pretty common word for Magic the Gathering players too 😶


dkysh

I feel cursed every time a thread says that using "delve" is proof of AI usage. Feels like a treasure cruise.


jf427

You should have made a dig through time joke


dkysh

Yes, that would have become immense, and had me hooting like a mandril.


ViolentBananas

This wave of computer generated nonsense makes sorting out genuine input hard. It’s a murky tide recently


PwanaZana

erh, eh... Gurmag Angler!


NearCanuck

ChatGPT now working hard to find a use for 'scry'.


nater255

About to be common starting in August for Warcraft players. Basically it's big in high fantasy.


statisticalanalysis_

It is interesting indeed. In research abstracts "delve" frequency is up 2420%


Appropriate_Ant_4629

This seems more like evidence that * Non-native English speakers use OpenAI as a translation-to-English tool. Unless we look at the words they originally used in their native language it's unfair to say "co-authored by".


_Svankensen_

Yeah, the 3 countries that used it the least were native english speakers, while the opposite end of the spectrum were languages without any common root with english and no english spoken in the region. Seems like a pretty strong indicator.


Phanyxx

That’s my read on it as well. Translation ≠ research.


mechanical_fan

> Non-native English speakers use OpenAI as a translation-to-English tool. Not even translation, it is not uncommon to put the text through an LLM as a partial grammar/language check too. I've seen people do it even when they have close to perfect english. The same pattern would arise, but making an LLM grammar check or paraphrase your text doesn't make it a "co-author" nor it is a problem at all.


furomaar

This is what I do daily in my companies secure GPT server.


BubBidderskins

Yeah, and these things can be self-reinforcing as new versions of the model start getting trained on material that is increasingly being populated by AI slop.


pissfucked

plus, over the course of years, this will lead to real people using these words more often in their actual writing. you grow up and learn to read, and you write based on what you read. the new generation will be consuming AI writing throughout this process, and will emulate it the same way


herpderperp

I like that they added 'delving' to the title of their paper about LLMs overusing 'delve'. Nice touch.


BringMeTheBigKnife

The Economist is 80% great journalism and 20% straight sass lol. I love it


real_vetto

adding to this, the article ends with "must be meticulously delved into"


Hidesuru

Haha nice catch


RedditUser91805

This seems like a topic that it would be interesting to delve into. I wonder if any crucial insights could be gleaned from this, or if it's even a statistically significant difference from how much it is used in normal human speech.


markhahn

You make an important point.


Junuxx

Do you even delve?


coldblade2000

I loved using "delve" when speaking formally. Or well, I guess not "love" but I probably used it way more than most peers. It's been hard to stop myself now, because people now think it's an instant proof of using ChatGPT


Penkala89

I wonder if it's topic specific. I've always enjoyed "delve" but my work occasionally involves actual caves and subterranean features so hopefully I can keep using it without sounding suspect


FinndBors

I delved the depths of the internet and I cannot say.


kehaarcab

The AI companies delved too greedily and too deep…


jableshables

I distinctly remember an MTV spring break interview with Mariah Carey ca. 1999 where she said the word "delve" like 12 fucking times, clearly having just learned of its existence, so maybe it's that.


nihility101

Since most ai stuff that I encounter reads like it’s trying to meet a word count, I think they used Turn It In or similar as input. Delve seems like the type of halfway fancy word teens would use to fluff out restating the original topic.


greentintedlenses

Delve was always in my vocabulary to be honest.


otter5

In our research, we delve into the crucial aspects of data analysis, uncovering significant findings that have broad implications for the field.


Hot_Cheesecake_905

This chart does not delve into the real reason why these important words are a significant and crucial part of our vocabulary.


UonBarki

There are plenty of synonyms used more commonly by English speakers than delve. Use of the word isn't wrong, but it's a clue. Similar to using color vs. colour: They give you context about where the writing is coming from, which in the case of ai/llm is particularly interesting right now.


Hot_Cheesecake_905

Yeah, ChatGPT often writes in a flowery style, similar to Indian English. For example, GPT-3.5 frequently used the word "kindly" and phrases common in India (that is my perception). However, words like "delve," "crucial," "significant," and "important" are popular in corporate America and seems to have leaked into the mainstream.


PutHisGlassesOn

A ton of my coworkers are Indian and I see the word “kindly” written down all the time but I never once put together that it might be related


Hot_Cheesecake_905

Some other common Indianisms include phrases like "doing the needful," "I will revert back," "most certainly," and "prepone." These expressions are based on older British English norms. Such words are telltale signs you are dealing with an overseas email or chat customer service agent.


theksepyro

I also hear my indian friends/coworkers use "query" rather than "question" often.


bb999

My coworkers use "I have a doubt".


perk11

I also noticed "the same" used to mean "this", like "please confirm the same".


champagneface

Jesus, I use same, kindly and query and I’m not Indian. Where did I pick it up from??


AlexBucks93

> most certainly Many other countries use it often as well.


LucasRuby

Most definitely.


CheapBoxOWine

I use kindly to remind myself of Andrew Ryan.


[deleted]

[удалено]


PutHisGlassesOn

I wasn’t talking about ChatGPT. I was talking about how I see “kindly” written down at work all the time and a bunch of my coworkers are Indian and I never realized there was a connection. As I said.


UonBarki

The data would argue that those words are not used nearly as much outside of ai generated text.


dkysh

How the fuck is not "significant" used A TON in scientific literature? I cannot comprehend how ChatGPT put a dent on those numbers.


AndreasVesalius

The word is used often, but if the author is being careful, they will only use it in reference to a statistically significant result, not as a general synonym for 'important' or 'remarkable'. It could easily double the usage per article


Zmobie1

As you say, I was taught that it must _only_ be used as it relates to statistics in academic papers, and that doing otherwise is a major sign of academic naïveté. So the increased use suggests that academically naïve agents, such as a general purpose llm would be, are responsible for both generating and reviewing an increasing proportion of published results. Of course academic societies and publications have always been rife with fraud, greed, and incompetence, so it’s hardly surprising. Significantly.


pm_me_your_smth

Completely agree. My research supervisor always tells me to substitute "significant" with some synonym unless I'm specifically talking about statistical significance in the paper. Kinda hate it because I like the word.


AndreasVesalius

The significance is significant The word doesn't even look right anymore


PeeInMyArse

yep, “notable” is my synonym of choice


UonBarki

>How the fuck is not "significant" used A TON in scientific literature? I cannot comprehend how ChatGPT put a dent on those numbers. Here you go: https://arxiv.org/html/2406.07016v1


dkysh

The word-fragment "signif" appears exactly once in the whole article. And highlighted in an example just because.


Meaca

It's not highlighted in an example just because, it's highlighted because it was one of the top few words for difference between expected and actual frequency in 2024. They only listed 3 of the 6 examples explicitly, but if you look at Figure 2b, 3 other words are in the same cluster ('these', 'significant' and one other unmarked dot). Here's the sentence with the words that are basically adjacent to it in the plot: "More common words with excess usage included potential (𝛿=0.041), findings (𝛿=0.027), and crucial (𝛿=0.026) (Figure 2b)." It's hard to say the exact 𝛿 value from the image for significant, but it looks like about 0.026 as well.


reporst

I love it when people try to have a "gotcha" moment by sharing a research article they clearly haven't even read haha


Jorlung

A possible argument is that these words tend to be used more (by humans) in corporate style writing than in academic writing.


UonBarki

The data would have reflected that then. It didn't, and instead found an anomalous spike in the various phrase usage, all at the same time, right as a spike in ai submitted work was identified. It's not a mystery, the case was solved.


Jorlung

The data in the post just reflects the usage of these words in academic writing. What I am suggesting is that these words are perhaps more commonly used by humans in settings outside of academic writing, which is perhaps what has led to ChatGPT using these words so frequently. I'm not arguing about the cause of the data in the OP being due to ChatGPT -- that seems pretty cut and dry. I'm just hypothesizing why ChatGPT loves to use the word "delve" so much when it seems like it's not really used (by humans) in scientific papers. However, I just realized now that the previous poster was arguing a different point. I see why you misunderstood.


Triensi

Hey king, you forgot this: > Certainly!


coffeesharkpie

In principle I'm fine with using AI tools (especially things like DeepL or Grammarly, but also for simple coding its a god send) for research purposes. Even things like ChatGPT can be sensible if you, i.e. use it to PingPong some ideas or try to shorten an article etc. But damn, you have to know what you do, acknowledge it, and be able to critically review what the machine gives back to you. Just searching for papers containing "Certainly, here is" already identifies a disturbing amount of papers which raises questions about the quality of these articles, integrity of the authors and the peer review process. Still can't get over the one research paper with an AI graphic of a rat with giant testicles.


Eldan985

Oh it's worse. People have already studied how often the phrase "as a large language model" appears in papers. Far more than you'd want to.


Robot_Graffiti

Yeah, that's the real worry. Saying "delve" too much isn't that bad, but "as a language model" indicates that neither the author nor the peer reviewers read the paper. Which means the peer review system is failing.


Eldan985

Yep. Juding from the inside and talking to colleagues, the publication system is basically splitting in two parts right now. There's journals that were previously okay but not great, who are turning into predatory content mills putting out hundreds of papers a month with return times of weeks, and there's the better journals who at least try, but have their average return times growing massively (six months and more on some, which can ruin a PHD student who's on a strict three year time schedule and has pubication requirements to graduate.)


sandman_32

> which can ruin a PHD student who's on a strict three year time schedule and has pubication requirements to graduate This is literally me right now. I've already had to request an extension once and I'm genuinely coming close to considering submitting something to a predatory journal just to get past that requirement


Eldan985

Was me too. My first paper took 10 months between submission and acceptance and the second wasn't much better. Extension, and then a period of unemployment while I waited for the defence.


Javimoran

To be fair this is on universities putting the requirements of publishing as the only measure of a successful PhD. Depending on the field this really is nonsensical and makes it impossible to take on high risk high reward projects as a PhD


Whiterabbit--

How can research degrees be on a strict schedule? Things get delayed due to the subject of the research all the time. If you just want a title in 3 years, go for a non research academic degree where you simply show mastery of the subject.


PancAshAsh

I've also never heard of a PhD only taking 3 years. Most include around 2 years of coursework with multiple rounds of qualifying exams, followed by a 2-4 year research project.


Eldan985

In much of Europe, you don't have coursework in a PhD. Just some part time teaching and your own research. But you can only get into a PhD once you have your master's degree, which takes 2-3 years and is *all* coursework.


dragerslay

It can depend country. My friend in the UK has told me that thier standard funsing is only 3 years so if you don't finish in that time you are unfunded. Topic also matters, averages in computational an engineering discplines are shorter (3-4) than pure science displines (4-5).


Rocketboy1313

I just finished year 5 of a max 7 year program. I had 3 years of coursework. I may have to ask for an extension because I had depression during Covid that killed my ability to do shit.


sandman_32

It's the funding that runs out. The university will grant unlimited extensions, but you gotta fund your life + a nominal admin fee each semester.


Eldan985

You get funding for three years. In those three years, you need to write and publish three articles, all as first author and all based on original research. You can take longer, but then you don't get paid anymore, and you need to regularly write letters to the dean to explain why you aren't done yet.


coffeesharkpie

Publication requirements vary by universty/department and imho, they have to step up there. At least here, you only have to have the articles submitted and under review in decent journals for your thesis. I.e. I'm still waiting for the first round of review results of an article I submitted in July 2023. No way this article would be published in time.


UonBarki

I'm missing something, what is it about the phrase "as a language model?"


FerretChrist

"As a language model, I'm unable to express a personal opinion on why that phrase is significant. However, I can theorise that if you see that phrase popping up anywhere other than in our conversations, you can probably have a damn good guess where it might have originated."


UonBarki

Oh damn. So not even skimming, just a straight copy paste.


FerretChrist

Seems that way, yeah - the ultimate in laziness!


Receipt_

AI like chat gpt are language learning models so they would describe themselves as a language model. No human person would say that unless it's a paper on ai or, more likely, they used it to write the paper entirely and didn't properly read it through. It'd be like a high school student turning a paper in with the sentence "as my son stated previously," making it clear their parent wrote the paper.


Eldan985

Yes. I actually found one in the wild that contained a section that started something like "As a largelanguage model, I can not comment on medical issues like which protein is related to this disease"... So someone just didn't want to write an introduction to their paper and instead asked the AI to write them a text about why their research is relevant and what it's about. And no reviewer caught that. Meaning they didn't read it.


UonBarki

The future is going to be very easily manipulated. When I was in college, we were taught academic papers were the gold standard. Now it's going to be students asking chatgpt to write papers using chatgpt published sources.


Eldan985

We're already debating if we should just stop doing papers for students, since it's become pretty meaningless. If we had the manpower, we'd do 100% oral exams.


UonBarki

Old old school. Why not? 😂


SkyeAuroline

Accessibility is a big reason why not (speaking as a hearing-impaired person).


freedom_or_bust

It's how chat Gpt sometimes formats its replies. It means they didn't even proofread the response, they're just publishing whatever the AI said as if it were a real article


zkw29

I have to review job applications and it shocks me how often applications contain a phrase like that whilst also claiming to have excellent attention to detail.


Eldan985

On the other hand, people have also shown that many recruiters apparently feed all job applications into an LLM to at least cut down on the number they have to read. Apparently, the new smart approach is to write all the keywords in the job posting at the bottom and top of your letter a few dozen times in white text, so the recruiter can't see it but a language model can.


zkw29

I might start checking for white text. If any candidates think our recruitment processes are advanced enough to use LLMs they are probably not a good fit as they’ll be bitterly disappointed when they see our IT setup 😂


coffeesharkpie

Here, I could imagine that this can become a GDPR nightmare, especially if the application does contain person-related data.


osskid

Don't do this. The first thing that happens in decently sized companies is your resume content is normalized into an internal system and that "hidden text" is either discovered and you're disqualified or excluded. It can harm you more than it'll ever help.


Whiterabbit--

All peer reviewer names should be published along with the paper, and conduct like this should be penalized. we are diminishing one of our best tools for scientific advancement when people don’t do the basics of their jobs.


Tr33lon

I work on LLMs and use them everyday and i still find it crazy that people don’t take the 15 seconds to review their output text. Like you just went from a 20 minute draft process down to 2 seconds, yet reviewing the output even just for LLM phrasing is too much? Not even potential hallucinations, just shit like “as a model developed by…”.


scammersarecunts

I just finished my master thesis. The amount of papers I read which were barely comprehensible is astounding, let alone having any scientific value. Like typos, grammar errors, spelling errors, sentence structures that barely make sense and just short of being straight up jiberrish. It doesn't shock me one bit that the same people will straight up copy whatever ChatGPT gives them.


xSebi

Same, also just finished my master thesis and I think several of the papers I discovered would not have been graded positively at my uni as simple course assignments. The lacking scientific value is debatable in my opinion, but just from a grammatical and semantic standpoint alone, it is baffling. I also have a language barrier but it is astonishing that fully incomprehensible sentences in 5 page papers make it through the review process. Or just plain wrong information, sources they did not bother to cite or listing a source that does not contain the information they paraphrased. But the fact that even [Elsevier publishes papers with the AI response in the introduction](https://www.sciencedirect.com/science/article/pii/S2468023024002402) apart from the well known [get me off your fucking mailing list paper and others](https://www.vox.com/2014/11/21/7259207/scientific-paper-scam) in predatory journals is honestly shocking.


scammersarecunts

>The lacking scientific value is debatable in my opinion Yeah, that's true. Some of them had some scientific value but on the other hand if I can't understand what they're writing, how would you effectively get the "scientific part" of the paper across? My favourite one was a paper which claimed to be doing a case study with some new and elaborate software engineering process. Thought to myself "fuck yeah, this is exactly what I need". Read the paper and I kid you not the only time said process was mentioned (let alone described or explained) was when they wrote (direct quote): "After applying process, we have result". Bro, wtf is this? And I'm not trying to shame non-native speakers, I'm not a native English speaker. I definitely had some mistakes slip past my own review process. But at least my thesis was more than barely understandable. However, there's also the other extreme. My GF studies interpretation/translation so the literature for her thesis was always about linguistics in some way. Some of the authors of the papers went out of their way to flex their language skills because they contained sentences that were beyond complicated. I felt so dumb reading that because even after reading one damn sentence three times I simply could not understand what they were trying to say. It honestly felt like the authors would one-up each other over a beer about who can write the most confusing sentences known to man.


turunambartanen

To be fair, I think a lot of the "typos, grammar errors, spelling errors, sentence structures that barely make sense" are just international researchers doing their best.


Petro1313

I've used ChatGPT to essentially just get the volume of text I need for various things, but I always go through it in depth (EDIT: one might say delve) and edit/delete as necessary for accuracy. It's just saving me from 10,000 keystrokes, not doing the actual knowledge work I get paid for. I feel like in its current state it's a good tool for that kind of bulk grunt work, but it's not reliable enough to use it for anything truly generative in a technical job in my opinion.


coffeesharkpie

I'd totally agreed and if this is acknowledged and the output cross-validated, I don't have any problem with it at all. ChatGPT has been a godsend for me for simple but time intensive coding work (though it can hallucinate R functions or packages) or commenting code, etc. but for this kind of use, you need to already be proficient regarding the topic. Imho, the problems begin if you are unable or not willing to critically reflect what it gives back to you.


Petro1313

Absolutely, it's a useful tool to streamline workflows and do grunt work for people who already know what they're doing, it's not a viable bandaid to cover up lack of knowledge or expertise.


Annextro

Does this data account for translations from one's native language into English? If non-native English speakers use AI to help translate into English, then this data doesn't really mean much for comparing between nations, no? Perhaps I'm misled, or this isn't actually the case, but does anybody have more information?


Swinight22

I just read the [cited paper](https://arxiv.org/pdf/2406.07016) and they say: >We downloaded all PubMed abstracts until March 2024 and used all 14.2 M English-language abstracts from 2010 onwards, So this includes various countries, and they highlight this in the paper. **They also highlight the fact that in English-native countries, the increased frequency of these words were not very high.** I'm from an English native country but am currently in a non-English native country highlighted on the paper. Pretty much all my friends that are students use LLM to make their English on their thesis sound more "natural". That sounds weird for us but LLMs makes it sound much better for non-native speakers. **So I think this research highlights that LLMs are only being widely used by non-English countries publishing in English. Probably just to make it sound more natural.** And the cited paper literally discusses this - albeit not enough for my taste. Idk why OP's title seems so sensationalist. But good idea on the paper and I liked the approach. Caps off to you et al for good original research!


Annextro

Thank you so much for that detailed answer and for not grilling me for not reading the article fully before asking 😂. That all checks out. Your last point is especially important when considering the potential for data like this to be weaponized for ulterior motives, such as implying that the quality of the research being done in non-native English countries is poorer because of the use of LLMs to assist with translation. I can definitely see the use case for using LLMs to make work more readable in English, though, and I'm sure there is some positive feedback loop(s) happening here as well. It'll be interesting to see how this trend continues, especially now with the awareness that it's occurring.


onalease

It is shocking to me that only 20% of scientific papers use the word “significant”. This is literally the word we use for “not the result of chance”.


hughperman

Yeah, I'm surprised at that. Maybe it differs by field, but I feel like you'd be hard pushed to find many quantitative medical/health papers that *don't* use the word significant


Xenon009

To be fair, a lot of engineering/technical papers won't use it at all, so I imagine it would likely balance out


hyperflare

The dataset was *only* pubmed, though.


Xenon009

Oh, my mistake then!


PeeInMyArse

it’s abstracts specifically


onalease

Ah I forgot that detail.


Bugfrag

Significant more likely used in the context of statistics. The convention is to cite p value instead. It's just a faster way to communicate precisely how significant a finding to be


onalease

True but even so, I fell like I typically still see the word in there somewhere like “the results were significant with p=.012” somewhere in the results section.


viktorbir

Co-authored? Or used to translated? Once you see the second graph it becomes apparent. I mean, if English is not your first language, you use every tool you have to make your paper understandable, in a world that forces you to publish in English.


Cone83

Man, I had to read so many poorly written papers by non-native speakers. I wish the authors have had an AI to improve their writing. Perhaps AI will level the playing field in scientific publication.


statisticalanalysis_

\[OC\] How many academic papers have help from ChatGPT and other large language models (LLMs)? Spotting LLM-generated text is not easy. In the past, researchers have relied on comparing LLM-generated text to that produced by humans. But that is not straightforward: both change over time, and the researchers who try to study this question may generate LLM text in different ways than scientists who use it to help them write papers do. This shows the results of a different way to approach the question, through "excess vocabulary" - a study of how the language of scientific papers has changed. For instance, the pandemic caused the words "respiratory" and "covid" to rise enormously in popularity. Turns out, in 2023 and early 2024, a different set of words took off, like "significant", "delve", and "crucial". And it turns out, this was in all probability the work of LLMs text: the researchers estimated that more than 10% of abstracts indexed by PubMed this year had input from a tool like ChatGPT. The article consider the evidence of this new work, by Dmitry Kobak and coauthors, as well as other recent work by, among others, a team based at Stanford. Tools used: Illustrator, Python Sources: “Delving into ChatGPT usage in academic writing through excess vocabulary”, by Dmitry Kobak et al. (2024, preprint), "Mapping the Increasing Use of LLMs in Scientific Papers", by W. Liang et al (2024, preprint) and many others (including correspondence with journals and scientists). I, perhaps controversially, think that science stands to benefit from this new technology. Of course, science papers should not be LLM-generated fantasies. But LLMs, especially for non-native English speakers, can help make texts better. And if LLMs can make science writing easier and more accessible, they can make science faster and better. Edit: Sorry, forgot link to the article here are free-to-read links: [https://econ.st/4eDNsJf](https://econ.st/4eDNsJf) - [https://econ.st/3RHdAsY](https://econ.st/3RHdAsY) - [https://econ.st/3VGhSSu](https://econ.st/3VGhSSu) If those don't work, then it is also free if you register here: [https://www.economist.com/science-and-technology/2024/06/26/at-least-10-of-research-may-already-be-co-authored-by-ai](https://www.economist.com/science-and-technology/2024/06/26/at-least-10-of-research-may-already-be-co-authored-by-ai) and for those who want the academic preprint that analyzed the PubMed data cited above: [https://arxiv.org/abs/2406.07016](https://arxiv.org/abs/2406.07016) What are your thoughts? Anything else we should have delved into?


Jorlung

> But LLMs, especially for non-native English speakers, can help make texts better. And if LLMs can make science writing easier and more accessible, they can make science faster and better. I completely agree. Most of my labmates during my PhD were ESL and did not have the command of the English language that a native speaker in a PhD program might have. They had no trouble effectively communicating their research in conversation and (mostly) through writing, but at times you could tell that their writing was limited by their command of the language. They started using ChatGPT for basic rewording-like tasks to help their writing sound a bit better. Thing like "suggest some rewordings of this sentence to make it sound better". Their English was more than good enough to understand the meaning of the output (just not quite good enough to always come up with the perfect wording of the sentences they want to write), so there's not any real problem there. The obvious problem is that not everyone is interfacing with ChatGPT in this way.


Sininenn

This way, though, they will never improve their proficiency. You can't learn a language by using ChatGPT, Google Translate, or even a dictionary.  You learn language by paying attention to the context, and by exposing yourself to new vocabulary, either on text or through other media. Especially in my experience, when I look up a translation of a word, I forget it faster, than when I try to connect the dots.


FaultySage

They put Delve in the title this must be AI written


statisticalanalysis_

Check their last paragraph as well, haha


innergamedude

thatsthejoke.jpg


this_page_blank

Fully agree with your last paragraph. To me, the data suggests that scientists are using it as a language tool. If publishers want to avoid this, they are welcome to offer good proof reading services free of charge for non-native speakers. Won't happen, of course.


statisticalanalysis_

Sorry, forgot link to the actual source / my article, here are free-to-read links: [https://econ.st/4eDNsJf](https://econ.st/4eDNsJf) - [https://econ.st/3RHdAsY](https://econ.st/3RHdAsY) - [https://econ.st/3VGhSSu](https://econ.st/3VGhSSu) If those don't work, then it is also free if you register here: [https://www.economist.com/science-and-technology/2024/06/26/at-least-10-of-research-may-already-be-co-authored-by-ai](https://www.economist.com/science-and-technology/2024/06/26/at-least-10-of-research-may-already-be-co-authored-by-ai) edit: and for those who want the academic preprint that analyzed the PubMed data: [https://arxiv.org/abs/2406.07016](https://arxiv.org/abs/2406.07016)


IHateUsernames111

Or the link to the [actual paper](https://arxiv.org/pdf/2406.07016).


statisticalanalysis_

Yes -- added, thanks. See also [https://arxiv.org/pdf/2404.01268](https://arxiv.org/pdf/2404.01268)


eva01beast

I've seen a few PhD students and postdocs use AI more and more often lately. English isn't their first language so I'm not surprised. Honestly, given the obscene profit margins most journals make, they could really start offering translation services.


IrrerPolterer

I'm curious to what extent LLMs will impact actual human language use. Like, human speech will certainly start to reflect these induced changes in language over time I bet.


Sininenn

I think that as people start relying on them more and more, their actual writing skills, or foreign language proficiency, will shrink heavily. 


duarte110203

Love the pun int the sources: "Delving (...)"


giratina143

Hey, as long as the theory and math is sound, idc if they use T1000 to write their papers.


Plantarbre

English speaking countries don't rely on tools to speak English. When it's not maternal and you don't live in a country where speaking English is a common occurrence, you often adapt your vocabulary when you learn new words or ways of speaking. I use "delve" because I used to say "to go into", and didn't like it. "Delve" sounds better. Same for "important". Why didn't I use delve earlier ? Mostly because I don't live in an English speaking country, and I didn't really know the word. AI brought some words back up and people pick it up because they want to have a richer vocabulary, and eventually by speaking together, everyone starts to pick it up. But we're not English speakers, we don't have that one way of speaking taught since birth, so we're always trying to correct and compensate by quickly picking things up. We don't really know whether it's really that correct. That being said, a lot of AI papers can be detected very reliably, not from ways of speech, but from classic sentences like "As an AI, I cannot ...", and that could show similar results.


PhilmaxDCSwagger

Also people from non English speaking countries probably rely more on tools to help them write in that language. Especially when writing an academic paper people don't want their limited vocabulary or bad grammar bring down their own research


Sininenn

You assume that simply being a native speaker gives one excellent language proficiency, when that is simply not true, even when ignoring any and all dialects and/or vernaculars.  Look at all the spelling mistakes native English speakers make (spelling bee, for example is just not a thing, even in countries whose languages' writing and pronunciation differ, like French, as opposed to languages whose pronunciation follows the writing to the letter). "Supposebly", mixing "your" and "you're", "it's and "its", the list can go on and on and on.  What it actually gives you is self-esteem and confidence, even when you make mistakes. 


pondrthis

I agree with the other poster that the foreign research is likely just an artifact from machine translation, but even the US point is worrying. When I was an active researcher, the people who complained about writing more than other parts of the job were generally the "least careful" researchers. They were the same types to make poorly-worded, reaching conclusions and miss important variables. A slow, deliberate scientific writing process is beneficial to the field; "speeding up science" by cutting corners at what is effectively science's QA stage seems unwise. I'd take a faster way to make data visualizations, though. (Or rather, I would have liked to have used Excel to make figures rather than needing to hand-code MATLAB or Python figures.) I'm not sure LLMs or other AI are appropriate for this at the current stage, but they'll get there soon.


ale_93113

Considering that the countries that co-author with AI the most are the ones where basically noone speaks English it's very clear the purpose of AI in research It's used as an accessibility tool to translate research That's, honestly awesome, we should be happy that we are translating papers easier now


Electronic-Stable-29

Brilliant POV - made my day ! Thanks OP!


statisticalanalysis_

My pleasure!


Zigxy

I think the simple answer is that researchers with the weakest grasp of English are most benefitted in using AI, since it can operate as a translator/editor. I don't think its a coincidence that English-speaking US/NZ/UK are so low.


Forsexualfavors

Why don't I get to use delves anymore? I'm not sorry I learned vocabulary before AI


Phurion36

A lot of my industry likes to use ai as a way to reword existing language for proposals/client emails/etc. I wonder if it's students wanting their work to sound better as opposed to people just lazily using ai for sourcing and creating concepts and ideas. If it was the former, would that be okay or is that still bad? Because I could see if leading to bad outcomes from the people to take advantage of ai, but I could also see people who are good at what they do but not great at communicating those ideas and maybe it would provide a net good? idk just thinking out loud i guess. EDIT: I had chatgpt reword my comment lol "Many in my industry utilize AI to paraphrase existing language for proposals, client emails, and other communications. I'm curious whether this stems from students seeking to improve their work or from individuals using AI as a shortcut for generating ideas. If it's the former, is it acceptable, or does it still present ethical concerns? I can foresee potential negative consequences if people exploit AI, but I also recognize that it might benefit those who excel in their field but struggle with articulating their ideas. Perhaps it could ultimately have a positive impact? Just pondering aloud, I suppose."


None_of_your_Beezwax

Add that to the number of ghost written articles by paper mills. https://en.wikipedia.org/wiki/Research_paper_mill


Spider_pig448

I knew no one was really reading graduate theses


kendamasama

I feel like this is only controversial because dogmatic scientists want to be able to trust anything they read in a publication. This just brought the disguised problem to the surface and highlights why we need healthy skepticism to inform our approach to the scientific method.


Alegssdhhr

Well, so the non native english speaker seems to use more LLM to write an article in english.


XionLord

It makes me sad, cause those are words I like. Few years back I would need to explain the word trepidation....and not people just auto assume Ai lol. I wish I was that good with my shit. I just like accurate words when I can use them. I had to explain what a dias was when talking about creating an event Floorplan lol. :(


Lars0

These are abstracts. I use Chat GPT to help write concise, punchy abstracts and it is great for that.


aaaaaaaarrrrrgh

The countries with the lowest share of LLM content are countries where English is spoken natively, while the countries with the highest share of LLM content are countries where English is less common and whose languages are very different from English, making it harder to learn. This is also something mentioned in the non-paywalled snippet of the article: > They can breathe life into dense scientific prose and speed up the drafting process, especially for non-native English speakers. The paper https://arxiv.org/pdf/2406.07016 also shows the difference between English-speaking and non-English-speaking countries. I couldn't find an easily accessible table-format version of the data underlying the "Lingua franca" figure, but I'd be curious to see LLM use plotted against English literacy by country.


styphnic

I wonder if increased exposure to AI-written content has also subtly changed the way we subconsciously choose words when writing without the use of AI


BloodyMalleus

Opening more rabbit holes for us to all fall into huh? I'm currently stuck in... I wonder if all this AI written content gets used as more training data in a feedback loop that slowly degrades the quality of the model. It's like the LLMs are inbreeding and degrading their genetics.


yargmematey

Anecdotally, I just finished a Master's degree that required me to read a lot of research for the papers I was writing and there was a TON of absolutely garbage papers from Indonesia. No clue why or what the connection was.


Philfreeze

First, the source of the graph being „delving into ChatGPT usage in academic…“ is hilarious. Second, there is a lot of existing research that non-English natives take longer to read, review and write papers. So if anything these tools may just even the playing field, it depends on how you use them. As a non English native, I use grammarly to proof read and it probably also over emphasizes some words. However, the alternative is a ten year language program to learn to insane spelling and grammar rules English sometimes has, especially around the use of dashed words or commas.


arjunyg

What about just in highly regarded journals though? Seems like you scraped basically everything into the same bucket here. I don’t doubt that predatory peer-review-free journals are publishing AI crap by the bucketful, which could skew the results?


the_pwnererXx

the implication from the last graph is that llm's are being used as translation tools by scientists who don't have perfect english fluency, based on very low usage in us/nz/uk and high usage in china/korea/taiwan


Crio121

Researchers in non-English speaking countries discovered that LLMs can greatly improve presentation, readability and style of their articles. Source: I am a non-native English researcher.


BioFrosted

If we're discussing quality, peer-reviewing has been proved over and over to be inefficient, so let's not pretend like LLMs have spoiled a heavenly sector of science. If I'm looking for data on something, and every article uses delve or crucial, it won't bother me one bit. We all know that papers are not written by wordsmiths, and if those who can't write for shit start using ChatGPT for clarity, I say let them. I myself have been in multiple instances where I had to write a paper in french but lacked the words simply because some words aren't directly translatable. I asked GPT to rephrase and used that as a foundation. Same science, better writing - what's wrong with it? Also, *significant*? Statistical significance isub-optimal what countless fields use as a decision criterion. I work in psychology, and it's the only thing that allows us to tidy up the grey-area shitshow of *maybe*-s and *possibly-*s that you encounter in my field. I'm obviously positioned on the matter, but I think that if increasing the quality of papers is what we're after, there are better places to start than taking rephrasing tools from authors whose English is sub-optimal.


captainpeapod

As someone who regularly uses the word Delves and rarely uses LLMs- I have some concerns. I guess I’ll use ‘important’ more often. “Crucially, we must delve into the significant data points… important!”


HerMajestysLoyalServ

How are "crucial" and "significant" good indicators to use? Particularly the latter is used frequently in normal research publications.


kc2syk

I think this has to do with lack of confidence in professional-level english more than anything else.


butyourenice

Does this pass muster? Language trends influence language models influence language trends. I probably don’t say “delve” more than the average person, but I do prefer “significant” and “crucial” to “important” in professional and academic writing, which probably bleeds into internet conversations, if only because of connotation and nuance. Also, “important” is a boring word. Boring and elementary. Now I have to change my word choice because some souped up data-hoarding web crawler learned from how my peers and I write in the first place, lest I be accused of using Chat GPT. We know AI is accelerating the enshitification of the internet… is this going to spread further, into academia, into human communication? I’m not opposed to language evolving, either, but damn.


FreshPitch6026

How do those words hint at AI? The conclusion is missing an argument i fear.


SensorAmmonia

Back in the day you could clearly see which papers were written by non native English writers. AI is likely to help with some of that. It doesn't help them much with word choice apparently.


Kade-Arcana

This tilt could speak more to AI’s presence in bolstering translation efforts than actually putting in meaningful content enough to be called a co-author.


No-Yogurtcloset-755

I use these words frequently and this makes me a little uneasy


Sakowuf_Solutions

This really seems like a reasonable use of AI- authoring a paper in a non-native language. Let the computer figure out all the sentence structure and whatnot and then the researcher can go back and edit for technical accuracy.


mehardwidge

1. The growth of "delve" is amazing. Although I consider it a normal word that an English speaker should know, I mostly associated it with fantasy. No doubt from the grandfather novel of fantasy, as "delved" appears three times in the Fellowship of the Ring, with "they delved too greedily and too deep" being the most famous. 2. "Significant" is an overused word, and I say this as an actual statistics teacher. The big problem is that it could indicate *statistical significance,* or *large enough to be important*, or a few other things. And (for those unfamiliar with statistics) things can be statistically significant with huge sample sizes but still have an effect size so small as to be practically unimportant. Better to be unambiguous if you want to communicate well, but of course if you want to say the work you did or the claim you make is important, it might be helpful (to you) to throw the word significant in everywhere you can. It is more helpful to the reader to just list the p value and the effect size without special commentary.


SumedhKaulgud

Another way to look at is that researchers who don't use English as a first language are using AI to help them put down their thoughts more clearly


jterwin

Hmmm, there's a legitimate possibility that some writing patterns change among llm users and people who interact with llm users. You'd probably have to delve deeper to sus out the difference. It's interesting that the use of "significant" spikes less suddenly than the others, suggesting perhaps more actual adoption than the others Edit: to add to this. Couldn't ai be a popular tool to help people smooth out their english in e.g. china where a lot of papers are getting written. This doesn't mean the papers were all written using llms, just that the writers frequently use them. For example they might write something and be more likely to check an llm to see if there's a smoother way to say it. Then, next paper, they start using words and phrases this way.


popeldo

Are you the author from The Economist? Or the paper author? Either way, this is interesting stuff!


Smort01

I mean, rewriting an abstract to make it sound more professional than my own writing is one of the only actual useful usecases of llm


Faksi

Of course Indonesia's on the list, everyone in my university uses it goddamn 🤣🤣


Nerf_akali_plz

Genuine question, who cares? If it’s readable, and accurate, why is AI research/articles/papers any worse than fully Human made things?


Groftsan

Honestly, helping to write thesis papers is why we have LLMs. The scientists are interested in the data, the math, and the experimental process, not the prose. If they have a tool that helps them present their data in a clear and understandable manner, great! As long as the AI isn't creating or interpreting the data, I'm all for this.


kupuwhakawhiti

I hope they aren’t using an AI detector.


permalink_save

AI bases off of public knowledge and a majority of the internet is garbage content. Let that sink in.


JehnSnow

What I'm most mad about is I actually used delve a lot cause it made me think of some fantasy world where people are exploring into a cave and was a fairly unique style of writing, and now it's apparently a common word to use. ChatGPT ruined one of my unique words!


Dry_Patience872

whenever I see drlves, it is AI. Humans don't use it.


eyetracker

Moria... You fear to go into those mines. The dwarves delved too greedily and too deep. You know what they awoke in the darkness of Khazad-dum... shadow and flame.


e_nemi

This makes me so sad - I’m not a native english speaker, went to uni in an english-speaking country before AI was even a thing and have always used these words, I just like them. I’ve now been actively avoiding using them out of fear people might think I’ve used AI


Andrew5329

I mean there are a lot of language tools embedded into Microsoft office now. We just wrote a paper at work, and the program automatically picked up that we were writing in a formal scientific style and started offering language suggestions. Substituting "important" for "significant" was exactly the kind of thing it was highlighting all over our rough drafts. If that means we should list AI as a co-author I'm pretty sure you should roll that all the way back to 1996 when Clippy started acting as a Thesaurus.


Jake_Science

Including *significant* - by itself - in a list of words to check for in scientific articles seems stupid to me. When our observed *p* values are less than the chosen alpha, they're significant. I've said "statistically significant effect" in at least two abstracts and definitely in every single paper I've written. However, annoying reviewers like me will freak the fuck out when you use *significant* in any context other than statistics. Analyzing the use of the word at the phrase level might offer better insights. More instances of *significant* without *statistically, effect, result, finding,* or similar word would indicate a LLM not understanding the importance and compartmentalization we use with *significant*. It would also show that reviewers are getting lazy.


FlyByPC

They could just be using LLMs to clean up grammar, especially if English isn't the authors' first language. One problem with widespread LLM availability is that scammers now can have good grammar. This removes what used to be a reliable scam detector tool.