Bonum Certa Men Certa

The LLM Ouroboros Phenomenon

posted by Roy Schestowitz on May 19, 2025,
updated May 19, 2025

An ouroboros in a 1478 drawing in an alchemical tract

Ancient Greek mythology came up with this concept of an ouroboros, wherein some animal - typically a snake for a feasible "IRL" (in real life) metaphor - eats itself by eating its own tail. We would not be the first to point out the analogy here for LLMs because an ouroboros is a good parable. This morning we catalogued two BSD and Linux sites complaining about desperate LLM scrapers staging a DDoS attack in pursuit of original, as in human-written, code or words. This isn't a new problem for us and in the past few days we served about half a million pages in Gemini Protocol, likely due to to LLM scrapers. It's obnoxious to say the least, but distinguishing benign from malicious (or worthless junk) requests is hard and a "moving target" (it's never enough as parasites learn to adapt).

This morning in IRC we made an assertion about LLMs and fake (slop) images. We also made several observations. Fact #1: over time slop gets worse (training set is like some blurry JPEG). Fact #2: People's "smell" for slop improves over time, as they 'train' on slop and can detect it based on prior encounters. Put 1 and 2 together.

Are LLMs bound to not only get worse but also more easily detectable by an increasingly sceptical general public? TheLayoff.com has just responded to this.

An associate opines that fact #1 (that slop gets worse over time) is exacerbated by the flood of slop on the Net being snarfed up by newer bots and mistaken for training data. "Thus the feedback loop I mentioned a long time back and which Andy wrote about in depth." (He was referring to Dr. Farnell's good writings about this dilemma - as he did several times in The CyberShow's blog)

To a certain extent my Ph.D. thesis (dissertation) covered this about two decades ago. The associate says that it's a "well-known problem from days of old".

There are several unique aspects to this, including validation bias. To me it seemed a bit related to but not the same as over-training because, as an associate explains, "overtraining is something else: too much data and the patterns become locked too tightly to the training set and less useful for new data".

For an LLM to scan online its own output serves to affirm the mistakes, or the errors, often euphemised as mere "hallucinations", which are innocent, not libellous, and by no means "intentional" and "harmful". Dr. Farnell and Dr. Kate Brown responded to this last October in "Radical disbelief and its causes".

In the context of my thesis (dissertation), a concern was raised about what we back then called "synthetic data" finding its way "back" into the training set. So when you check brain MRI scans (which is what we did back then) you must ensure you only ever deal with real data, not mock or manipulated data that can confirm your own biases and "fit into" the model that generated it in the first place (in generative mode). To use the analogy of text-based LLMs, your BS is "truth" if your input is your own BS (output/s) and it would be deemed accurate, based on you (opposite of the notion of peer review in science). The associate correctly points out, based on a scan of my thesis (dissertation), that the strings "overtraining" and "over-training" are not in the dissertation, but we used different terms back then.

A squat toilet (also known as an Eastern, Turkish, Iranian or Natural-Position toilet). This one is in Turkey

"An LLM Ouroboros of shit", as the associate dubs it, would be statistical models (such as PDMs or AAMs*) treating computer-generated images as something from "the real world".

The so-called "generative hey hi" (genAI) "bros" won't allow the media to talk about such issues, at least if they can downplay the issues and deny/misportray them (in the media). But it's a real and growing problem. Its magnitude likely grows quadratically, not linearly. Just like other bubbles (overabundance based around hype), don't expect linear implosions. When it's gone (poof!), it's gone.

____

* PDM and AAM need expansion in the explanatory sense, not just words (in the acronyms). PDMs go back several decades ago they were invented or pioneered by the people who tutored me. They use mathematical, statistical models to perform multidimensional analysis of data variations, based upon principal component analysis (PCA). AAMs are an extension but with textures, not only points. This is really old stuff; even AAMs are over 23 years old; now the mainstream media pretends those are some kind of "revolution".

Other Recent Techrights' Posts

GNU/Linux Grows at Windows' Expense and Microsoft Trolls Infest and Maliciously Target Articles About It
Microsoft is - and has long been - organised crime
They Say I'm Mr. Bombastic
They didn't take good lawyers
 
Computers Got Smaller, So GNU/Linux Got Bigger
Many people here recognise the lack of urgency (or need) to get expensive new laptops
BetaNews is a Plagiarism and LLM Slop Hub, the Chief Editor Isn't Addressing This Problem Anymore
SS Fagioli is basically a parasite leeching off or exploiting other people's work
Links 09/06/2025: Chaos in Los Angeles and Hurricane Season
Links for the day
Links 09/06/2025: Windows TCO and Many Data Breaches
Links for the day
Abuse Inside the Polish Patent Office (UPRP) - Part VI: Political Stunts by Former President Edyta Demby-Siwek and the Connection to Profound Corruption at EUIPO
it's like a money-laundering operation where one politician rewards another at taxpayers' expense
Gemini Links 09/06/2025: Pipelines and Splitgate
Links for the day
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Sunday, June 08, 2025
IRC logs for Sunday, June 08, 2025
Links 08/06/2025: Tiananmen Carnage Censorship Persists, North Korean Goes Offline
Links for the day
Gemini Links 08/06/2025: Love as an Ethnographic Method and Monitorix Gemini-Frontend v0.1
Links for the day
Links 08/06/2025: Exposure of More GAFAM Surveillance and Social Security Records Compromised
Links for the day
Linux Foundation is a Mediator for Microsoft et al, Not for Small Companies That Support Rather Than Attack the GPL
Many people still wrongly assume that because it is called "Linux Foundation", then it is pro-Linux and represents the same mindset
This Past Friday, Confirming What We Said All Along About Brett Wilson LLP: It's Shrinking, Has Considerable Debt, Loss of Net Assets Despite the Microsoft SLAPP Money
The documents only became publicly available less than 2 days ago
Some of the Many Reasons We Sued Microsofters for Harassment
perpetrators of harassment
For 20 Years Many People Were Sharecropping for Canonical's Oligarch, Now He's Deleting All Their Contributions
"Ubuntu has erased instead of archiving the trove of material at Ubuntu Forums"
There Was Always Too Much 'Crazy Stuff' Going on Around Freenode
What many IRC users lost sight of
Exposing Crime is Not a Crime (It Never Was)
In the eyes of rich and powerful people, those who speak about their crimes are the "criminals"
GNU/Linux Distros Abandoning Microsoft GitHub
Will curl be next to leave Microsoft GitHub?
Expect More XBox Mass Layoffs Soon If the Rumours Are True
From a Microsoft media operative
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Saturday, June 07, 2025
IRC logs for Saturday, June 07, 2025
Europe Needs to Move Away From GAFAM; The Sooner, the Better
Europe - not just the EU - must abandon GAFAM as soon as possible
The Issue Isn't GNOME's Promotion of Diversity But GNOME Corruption, Abuse, Censorship, and Worse
So-called "Conservative" (republican, pro-Trump, bigoted) people want you to think the problem with GNOME is politics
When the News Sources Become Scarce and Increasingly Full of Polluted/Contaminated 'Content' (With LLM Slop and Slop Images)
Integrity matters
"Linux" Sites That Spew Out LLM Slop
We're lacking enough material for another "Slopwatch"
Abuse Inside the Polish Patent Office (UPRP) - Part V: Breaking the Law, Just Like EPO
We'll hopefully cover some of the pertinent details later this year
Links 08/06/2025: Security Lapses, CISA Cuts, and More
Links for the day
Gemini Links 07/06/2025: Mime Types and Geminisphere Introduction
Links for the day
Links 07/06/2025: Slop Companies Retain All Private Data, More Books Banned in the US
Links for the day
Gemini Links 07/06/2025: "A Monk's Guide to Happiness" and "Wireless Earbuds"
Links for the day
Links 07/06/2025: More Rumours of Mass Layoffs in Microsoft's XBox Division, New COVID Variant
Links for the day
Drug Addiction is a Real Problem, It Destroys Families
a rather sensitive matter
Abuse Inside the Polish Patent Office (UPRP) - Part IV: Political Scrutiny and Errors/Inconsistencies in Official Documents
When such organisations receive scrutiny they start focusing on cover-up and muzzling of facts (or crushing people who say the truth)
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Friday, June 06, 2025
IRC logs for Friday, June 06, 2025