Bonum Certa Men Certa

The LLM Ouroboros Phenomenon

posted by Roy Schestowitz on May 19, 2025,
updated May 19, 2025

An ouroboros in a 1478 drawing in an alchemical tract

Ancient Greek mythology came up with this concept of an ouroboros, wherein some animal - typically a snake for a feasible "IRL" (in real life) metaphor - eats itself by eating its own tail. We would not be the first to point out the analogy here for LLMs because an ouroboros is a good parable. This morning we catalogued two BSD and Linux sites complaining about desperate LLM scrapers staging a DDoS attack in pursuit of original, as in human-written, code or words. This isn't a new problem for us and in the past few days we served about half a million pages in Gemini Protocol, likely due to to LLM scrapers. It's obnoxious to say the least, but distinguishing benign from malicious (or worthless junk) requests is hard and a "moving target" (it's never enough as parasites learn to adapt).

This morning in IRC we made an assertion about LLMs and fake (slop) images. We also made several observations. Fact #1: over time slop gets worse (training set is like some blurry JPEG). Fact #2: People's "smell" for slop improves over time, as they 'train' on slop and can detect it based on prior encounters. Put 1 and 2 together.

Are LLMs bound to not only get worse but also more easily detectable by an increasingly sceptical general public? TheLayoff.com has just responded to this.

An associate opines that fact #1 (that slop gets worse over time) is exacerbated by the flood of slop on the Net being snarfed up by newer bots and mistaken for training data. "Thus the feedback loop I mentioned a long time back and which Andy wrote about in depth." (He was referring to Dr. Farnell's good writings about this dilemma - as he did several times in The CyberShow's blog)

To a certain extent my Ph.D. thesis (dissertation) covered this about two decades ago. The associate says that it's a "well-known problem from days of old".

There are several unique aspects to this, including validation bias. To me it seemed a bit related to but not the same as over-training because, as an associate explains, "overtraining is something else: too much data and the patterns become locked too tightly to the training set and less useful for new data".

For an LLM to scan online its own output serves to affirm the mistakes, or the errors, often euphemised as mere "hallucinations", which are innocent, not libellous, and by no means "intentional" and "harmful". Dr. Farnell and Dr. Kate Brown responded to this last October in "Radical disbelief and its causes".

In the context of my thesis (dissertation), a concern was raised about what we back then called "synthetic data" finding its way "back" into the training set. So when you check brain MRI scans (which is what we did back then) you must ensure you only ever deal with real data, not mock or manipulated data that can confirm your own biases and "fit into" the model that generated it in the first place (in generative mode). To use the analogy of text-based LLMs, your BS is "truth" if your input is your own BS (output/s) and it would be deemed accurate, based on you (opposite of the notion of peer review in science). The associate correctly points out, based on a scan of my thesis (dissertation), that the strings "overtraining" and "over-training" are not in the dissertation, but we used different terms back then.

A squat toilet (also known as an Eastern, Turkish, Iranian or Natural-Position toilet). This one is in Turkey

"An LLM Ouroboros of shit", as the associate dubs it, would be statistical models (such as PDMs or AAMs*) treating computer-generated images as something from "the real world".

The so-called "generative hey hi" (genAI) "bros" won't allow the media to talk about such issues, at least if they can downplay the issues and deny/misportray them (in the media). But it's a real and growing problem. Its magnitude likely grows quadratically, not linearly. Just like other bubbles (overabundance based around hype), don't expect linear implosions. When it's gone (poof!), it's gone.

____

* PDM and AAM need expansion in the explanatory sense, not just words (in the acronyms). PDMs go back several decades ago they were invented or pioneered by the people who tutored me. They use mathematical, statistical models to perform multidimensional analysis of data variations, based upon principal component analysis (PCA). AAMs are an extension but with textures, not only points. This is really old stuff; even AAMs are over 23 years old; now the mainstream media pretends those are some kind of "revolution".

Other Recent Techrights' Posts

Parties and Milestones Again
we've begun putting up about 40 balloons
 
Links 28/10/2025: Mass Layoffs at Amazon and Charter to Cut 1,200 Jobs
Links for the day
The Cocaine Patent Office - Part II: The Person Who Planted Paid-for Fake News for the European Patent Office (EPO) is a Cocaine User, Friend of António Campinos, Now on Record as Having Been Arrested
Background: High-level manager at the European Patent Office caught in public with cocaine, arrested
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Monday, October 27, 2025
IRC logs for Monday, October 27, 2025
Google News Drowning in Slop (and Slopfarms That Hijack About Half the Results)
Google News seems to be drowning in this stuff
Gemini Links 28/10/2025: "How to Maximize Your Positive Impact" and ASCII Art and Artist Attribution
Links for the day
PETA and Activism
Being staff or volunteer in PETA isn't easy
Big Blue, Huge Debt
debt will soar again
Links 27/10/2025: Mass Surveillance Sold as "AI", People Reluctant to Lose Physical Media
Links for the day
Techrights' 19th Anniversary: Bronze
Time to go back to preparing for this anniversary
Our Latest European Patent Office (EPO) Series Will Last Several Weeks, Will Ask the EPO Management and the European Union (EU) Very Difficult Questions
If nobody loses a job (or jobs) over this, then the EU basically became no better than Colombia or Nicaragua
Slopwatch: LinuxSecurity, UbuntuPIT, Brian Fagioli, and Google News
We focus on stories that are fake or LLM slop that disguises itself as "news" about Linux
Links 27/10/2025: Wikipedia Vandalism, Bruce Perens Opens up on Childhood
Links for the day
This Site Could Not be Done by LLMs Even If It Wanted to (Because It's Not a Parrot of What Other Sites Say)
LLMs have no knowledge or deep understanding
Microsoft is Disloyal Towards Its Most Loyal Employees
Against its most faithful enablers
19 Years, No Censorship
No factual information is ever going to be removed, more so if it is in the public interest
We Are Not a Conventional Site, That's Why They Hate (or Love) Us
Throughout the week this week we'll be focusing on the EPO
Following the Line of Cocaine All the Way to the Top
Even a million denials and spin-doctoring won't distract from the core issue
The Cocaine Patent Office - Part I: António Campinos Brought Corruption and Nepotism to the EPO, Then Came the Cocaine
High-level manager at the European Patent Office (EPO) caught in public with cocaine, the Office has some answering to do
Purchasing/Possessing Computers Isn't the Same as Controlling Computers
Let's strive to put computers back under the control of their users, no matter who purchased these (usually the users)
Gemini Links 27/10/2025: Alhena 5.4.3 and Fixing Bash
Links for the day
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Sunday, October 26, 2025
IRC logs for Sunday, October 26, 2025
Thankfully We've Made Copies of More Interesting Data From statCounter
If statCounter (the Web site or the 'webapp') vanished overnight, we'd still have something left of it
More Silent Layoffs at IBM/Red Hat
when the media counts such layoffs or presents tallies the numbers are very incomplete
Links 26/10/2025: Microsoft Spies on Gamers, Open Transport Community Conference
Links for the day
Links 26/10/2025: LLM Slop / Plagiarism Programs Continue to Disappoint, CISA Layoffs Threaten Systems
Links for the day
Gemini Links 26/10/2025: Gemsync and Joining the Small Web
Links for the day
India.com a Click-baiting, SEO-Spamming, Slopfarming Heap
They do this almost every day
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Saturday, October 25, 2025
IRC logs for Saturday, October 25, 2025
Without XBox Consoles, XBox is No More, It's Just a Brand (More Rumours of Microsoft Ending XBox, Then Laying Off Lots of Staff)
All signs indicate that Microsoft wants to "exit" the XBox business (not brand), but it does not want to publicly admit this as it would alarm staff and shareholders