Bonum Certa Men Certa

The LLM Ouroboros Phenomenon

posted by Roy Schestowitz on May 19, 2025,
updated May 19, 2025

An ouroboros in a 1478 drawing in an alchemical tract

Ancient Greek mythology came up with this concept of an ouroboros, wherein some animal - typically a snake for a feasible "IRL" (in real life) metaphor - eats itself by eating its own tail. We would not be the first to point out the analogy here for LLMs because an ouroboros is a good parable. This morning we catalogued two BSD and Linux sites complaining about desperate LLM scrapers staging a DDoS attack in pursuit of original, as in human-written, code or words. This isn't a new problem for us and in the past few days we served about half a million pages in Gemini Protocol, likely due to to LLM scrapers. It's obnoxious to say the least, but distinguishing benign from malicious (or worthless junk) requests is hard and a "moving target" (it's never enough as parasites learn to adapt).

This morning in IRC we made an assertion about LLMs and fake (slop) images. We also made several observations. Fact #1: over time slop gets worse (training set is like some blurry JPEG). Fact #2: People's "smell" for slop improves over time, as they 'train' on slop and can detect it based on prior encounters. Put 1 and 2 together.

Are LLMs bound to not only get worse but also more easily detectable by an increasingly sceptical general public? TheLayoff.com has just responded to this.

An associate opines that fact #1 (that slop gets worse over time) is exacerbated by the flood of slop on the Net being snarfed up by newer bots and mistaken for training data. "Thus the feedback loop I mentioned a long time back and which Andy wrote about in depth." (He was referring to Dr. Farnell's good writings about this dilemma - as he did several times in The CyberShow's blog)

To a certain extent my Ph.D. thesis (dissertation) covered this about two decades ago. The associate says that it's a "well-known problem from days of old".

There are several unique aspects to this, including validation bias. To me it seemed a bit related to but not the same as over-training because, as an associate explains, "overtraining is something else: too much data and the patterns become locked too tightly to the training set and less useful for new data".

For an LLM to scan online its own output serves to affirm the mistakes, or the errors, often euphemised as mere "hallucinations", which are innocent, not libellous, and by no means "intentional" and "harmful". Dr. Farnell and Dr. Kate Brown responded to this last October in "Radical disbelief and its causes".

In the context of my thesis (dissertation), a concern was raised about what we back then called "synthetic data" finding its way "back" into the training set. So when you check brain MRI scans (which is what we did back then) you must ensure you only ever deal with real data, not mock or manipulated data that can confirm your own biases and "fit into" the model that generated it in the first place (in generative mode). To use the analogy of text-based LLMs, your BS is "truth" if your input is your own BS (output/s) and it would be deemed accurate, based on you (opposite of the notion of peer review in science). The associate correctly points out, based on a scan of my thesis (dissertation), that the strings "overtraining" and "over-training" are not in the dissertation, but we used different terms back then.

A squat toilet (also known as an Eastern, Turkish, Iranian or Natural-Position toilet). This one is in Turkey

"An LLM Ouroboros of shit", as the associate dubs it, would be statistical models (such as PDMs or AAMs*) treating computer-generated images as something from "the real world".

The so-called "generative hey hi" (genAI) "bros" won't allow the media to talk about such issues, at least if they can downplay the issues and deny/misportray them (in the media). But it's a real and growing problem. Its magnitude likely grows quadratically, not linearly. Just like other bubbles (overabundance based around hype), don't expect linear implosions. When it's gone (poof!), it's gone.

____

* PDM and AAM need expansion in the explanatory sense, not just words (in the acronyms). PDMs go back several decades ago they were invented or pioneered by the people who tutored me. They use mathematical, statistical models to perform multidimensional analysis of data variations, based upon principal component analysis (PCA). AAMs are an extension but with textures, not only points. This is really old stuff; even AAMs are over 23 years old; now the mainstream media pretends those are some kind of "revolution".

Other Recent Techrights' Posts

SUEPO Central Made a Strike (or Striking) Success
Europe has more than enough qualified patent officials
 
Not April Fools But April First: Red Hat Staff Becoming "IBM"
claims of mass layoffs set to kick off at IBM some time soon
Gemini Links 31/03/2026: Antenna Packed Up, AuraGem and AuraSearch Maintenance
Links for the day
Links 31/03/2026: More Social Control Media Bans, BBC Now Run by GAFAM (US) Executive
Links for the day
'Broligarchs' Don't Want Science, They Want Entertainers to Entertain Them (and Make Them Richer)
Of course this will result in things getting worse in the sciences and everyone who relies on the sciences
When Republics Turn From Democratic Governments Into Imperialistic Dictatorships
What goes on in the US would require talking about politics
Companies That Have Nothing Except Buzzwords and Promises Will Perish
Dishonest media will perish along with the companies it is covering up for
The Solicitors Regulation Authority (SRA) to be Grilled in Two Weeks' Time by the British Government for "Recent Regulatory Failures"
we escalated to our politicians
GNU/Linux Will Thrive as Long as It's Modular, Not Monolithic
To IBM, it's all about money. Nothing else matters.
EPO "Cocaine Communication Manager" - Part X - People Are Leaving
"I was happy to be at the EPO in the beginning, but since I realized it's all a big mafia"
IBM's 33 Years as a "Financial Engineering" (Accounting Tricks) Company
In relation to Red Hat, this "financial engineering" involves culling many workers and trying to replace them with slop
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Monday, March 30, 2026
IRC logs for Monday, March 30, 2026
Links 31/03/2026: Rising Costs, Cyberattacks, Novo Patent Expiry
Links for the day
Gemini Links 31/03/2026: American Spring, Distributed Systems Simulator, and Calculus for Electronics
Links for the day
IBM Layoffs and Their Expected Scope in April 2026
Such layoffs impact not only IBM "proper"
SLAPP Censorship - Part 28 Out of 200: Facing Consequences for Impersonation and Worse
It's not "funny". It is moreover libellous.
Links 30/03/2026: South Korea Next to Curb Social Control Media Addiction and Manipulation, Notorious Patents in the US Challenged
Links for the day
Gemini Links 30/03/2026: Going Back to Wrist Watches and Why LLMs in Programming Suck
Links for the day
Did IBM Pay thestreet.com for Puff Pieces? (Like It Did With Forbes)
If so, there is no disclosure
Wikipedia - Funded by Slop-pushing Companies and 'Broligarchs' - Gave Benefit of the Doubt to Slop, Then Regretted It
Wikipedia sucks. Without slop it'll suck a little less.
Payoffs of Lifelong Commitments
"The Lifelong Activist"
Links 30/03/2026: "We Can’t Income-Tax Ultra-Elites"; "The Pirate Bay’s Oldest Torrent Turned 22"
Links for the day
Today, Europe's Second-Largest Institution (EPO) Goes on Strike That Can Last Until 2027. Nobody in the Media Covers This!
"We stand with the protesters"
When the Cost (or Time) of Maintenance Exceeds the Value
In recent years it seems like more people learn to remove things from their lives, not add more things
Passage of Wealth Upwards, Blaming the Victims
Tim Sweeney's net worth is 5.1 billion USD according to Forbes
More Media Needs to Tell the Public Slop is a Giant Bubble, It Should Stop Taking "Sponsorship" Money to Inflate This Bubble
If enough of (what's left of) the media changes its tune and quits being a parrot of GAFAM, then we can debate slop like grown-ups
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Sunday, March 29, 2026
IRC logs for Sunday, March 29, 2026
Trying to Hide One's Abuses by Imposing Silence on Critics ("My Profile Was Private")
With enough daylight, sooner or later everyone knows you are a vampire
Fedora Badges System Shows the Demise of Fedora Under IBM
IBM isn't good at keeping what it buys
IBM is Sunsetting Red Hat, It Only Uses the Brand and the Shell
IBM buys or spins off companies as containers for "toxic assets" and debt
Cisco Systems is a Still Weak Spot With Bug Doors
nothing to offer except storytelling
EPO Strike Begins Today and It's the Longest One Yet (Can Last a Year)
Where's the media?
Gemini Links 30/03/2026: Approaching April and Arvelie Calendar
Links for the day
No Daylight Saved
Is there still any practical reason for this ritual?
Microsoft Azure Does Not Have "Hiring Freezes", It Has Had Mass Layoffs Every Year Since 2020
Things are always a lot worse than Microsoft formally or publicly acknowledges
SLAPP Censorship - Part 27 Out of 200: Using the Tor Network to Hide From Consequences
Only 1-2 weeks after the countersuit the Canadian attempted to deplatform several Web sites
The Limits of Inclusion
Inclusion with caution isn't "opinionated"; it's a defence mechanism, sometimes a survival instinct
Almost 20 Years After Microsoft/Novell
The mission has not changed, but the priorities evolve all the time
People Discuss Rumours of Mass Layoffs at IBM Becoming Public in 1-2 Weeks
IBM is killing its brand or its "goodwill"
LLM Slop Kills Sites, as Sites That Adopt Slop Are Doomed
People won't subscribe to such sites and visit them if they recognise it's just slop
Links 29/03/2026: Indonesia Cracks Down on Social Control Media Addiction, China Becomes World’s Scientific Superpower
Links for the day
Fedora at the Mercy of Microsoft Because of Back-Doored Kick-Switch Boot
We'll soon revisit the defamation attacks on Torvalds
Links 29/03/2026: Water Shortages and No Kings Rallies
Links for the day
The Old Days
In the early days of this site (2006) it was mostly just a couple of people, plus comments
Gemini Links 29/03/2026: Return to Gopherspace, "Zen of Marking Playing Cards"
Links for the day
The Real XBox is Dead, So Microsoft is Calling Everything "XBox" Now
It even wanted to run a campaign to convince everybody that XBox is not actually a console
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Saturday, March 28, 2026
IRC logs for Saturday, March 28, 2026