Bonum Certa Men Certa

The LLM Ouroboros Phenomenon

posted by Roy Schestowitz on May 19, 2025,
updated May 19, 2025

An ouroboros in a 1478 drawing in an alchemical tract

Ancient Greek mythology came up with this concept of an ouroboros, wherein some animal - typically a snake for a feasible "IRL" (in real life) metaphor - eats itself by eating its own tail. We would not be the first to point out the analogy here for LLMs because an ouroboros is a good parable. This morning we catalogued two BSD and Linux sites complaining about desperate LLM scrapers staging a DDoS attack in pursuit of original, as in human-written, code or words. This isn't a new problem for us and in the past few days we served about half a million pages in Gemini Protocol, likely due to to LLM scrapers. It's obnoxious to say the least, but distinguishing benign from malicious (or worthless junk) requests is hard and a "moving target" (it's never enough as parasites learn to adapt).

This morning in IRC we made an assertion about LLMs and fake (slop) images. We also made several observations. Fact #1: over time slop gets worse (training set is like some blurry JPEG). Fact #2: People's "smell" for slop improves over time, as they 'train' on slop and can detect it based on prior encounters. Put 1 and 2 together.

Are LLMs bound to not only get worse but also more easily detectable by an increasingly sceptical general public? TheLayoff.com has just responded to this.

An associate opines that fact #1 (that slop gets worse over time) is exacerbated by the flood of slop on the Net being snarfed up by newer bots and mistaken for training data. "Thus the feedback loop I mentioned a long time back and which Andy wrote about in depth." (He was referring to Dr. Farnell's good writings about this dilemma - as he did several times in The CyberShow's blog)

To a certain extent my Ph.D. thesis (dissertation) covered this about two decades ago. The associate says that it's a "well-known problem from days of old".

There are several unique aspects to this, including validation bias. To me it seemed a bit related to but not the same as over-training because, as an associate explains, "overtraining is something else: too much data and the patterns become locked too tightly to the training set and less useful for new data".

For an LLM to scan online its own output serves to affirm the mistakes, or the errors, often euphemised as mere "hallucinations", which are innocent, not libellous, and by no means "intentional" and "harmful". Dr. Farnell and Dr. Kate Brown responded to this last October in "Radical disbelief and its causes".

In the context of my thesis (dissertation), a concern was raised about what we back then called "synthetic data" finding its way "back" into the training set. So when you check brain MRI scans (which is what we did back then) you must ensure you only ever deal with real data, not mock or manipulated data that can confirm your own biases and "fit into" the model that generated it in the first place (in generative mode). To use the analogy of text-based LLMs, your BS is "truth" if your input is your own BS (output/s) and it would be deemed accurate, based on you (opposite of the notion of peer review in science). The associate correctly points out, based on a scan of my thesis (dissertation), that the strings "overtraining" and "over-training" are not in the dissertation, but we used different terms back then.

A squat toilet (also known as an Eastern, Turkish, Iranian or Natural-Position toilet). This one is in Turkey

"An LLM Ouroboros of shit", as the associate dubs it, would be statistical models (such as PDMs or AAMs*) treating computer-generated images as something from "the real world".

The so-called "generative hey hi" (genAI) "bros" won't allow the media to talk about such issues, at least if they can downplay the issues and deny/misportray them (in the media). But it's a real and growing problem. Its magnitude likely grows quadratically, not linearly. Just like other bubbles (overabundance based around hype), don't expect linear implosions. When it's gone (poof!), it's gone.

____

* PDM and AAM need expansion in the explanatory sense, not just words (in the acronyms). PDMs go back several decades ago they were invented or pioneered by the people who tutored me. They use mathematical, statistical models to perform multidimensional analysis of data variations, based upon principal component analysis (PCA). AAMs are an extension but with textures, not only points. This is really old stuff; even AAMs are over 23 years old; now the mainstream media pretends those are some kind of "revolution".

Other Recent Techrights' Posts

"Major [IBM] Reductions Will Take Place Soon in Rochester MN"
Maybe that's just the latest office gossip
 
Valve Can Bring More Users to GNU/Linux, But It Won't Bring Freedom
Steam is DRM
Social Control Media is Bots (Fake Traffic, Fake 'Engagement')
As per FORTUNE, 76% of Twitter is alleged to be bots now
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Monday, December 22, 2025
IRC logs for Monday, December 22, 2025
Techrights as 'Regulator' Against Runaway Trains
"Runaway trains" never scared us because we know that they, unlike us, don't think rationally
How the Slop (So-called 'AI') Bubble Will Burst Next Year
There are already talks about mass layoffs in January
"Generative AI Bubble Has Begun to Pop", Nvidia Rides “Circular Financing... a Strategy That Hearkens Back to the Dot-com Crisis”
For companies like Microsoft this may mean another 30,000+ layoffs next year
Microsoft-Connected Media Talking About XBox Division "Profit Margins" is Distraction From XBox Sales Collapsing 70% in One Year
The simple fact is, Microsoft's console is dead in the water
The Reality is "Vibe Code" (Slop) is That It's Worthless
“Confidently Wrong”
British Web Developers Can Probably Ignore Firefox Users (Based on US Standards)
Mozilla has managed to piss off enough people
On the 'Digital Gulag' of 'Secure Boot' and Microsoft Disguising Its Attacks on Users as "Security"
Dr. Andy Farnell has this new article
Slopfarms Can Only Survive in Google News, Which is Still Promoting Them
Google News promoted only 3 slopfarms today
Gemini Links 22/12/2025: Films, Creativity vs. Consumption, Slop in YouTube
Links for the day
Microsoft XBox Losing Money, Layoffs and Studio Shutdowns (As Well as Price Hikes) Not the Solution
Microsoft does not quite talk about profits
Links 22/12/2025: Data Breaches, deterioration in Politics, and Geminispace
Links for the day
Links 22/12/2025: North Korean Applicants Target GAFAM (Amazon), ‘Orwellian Climate of Fear’ of CPC (Even Outside China)
Links for the day
More IBM Layoffs in India
It's not as simple as "laid off to be replaced by an Indian"
GAFAM Deeply Connected to Jeffrey Epstein, Richard Stallman (RMS) in No Way Connected to Jeffrey Epstein
people who hoarded all the capital get to decide what people think and say
Linus Torvalds Has a Birthday This Coming Weekend, Thankfully He Still Controls His Main Project
GNU and Linux should remain under their control as long as they live
Mozilla is Getting Attention for All the Wrong Reasons, Take a Look at LibreWolf
Just last week Mozilla added a new top-level manager who (as usual) came from a "tech giant"
When Conformism Means Capitulation and Defeat
In an age of injustices like these, we all have some kind of moral obligation not to be conformist.
Text is Still King
But the so-called 'industry' insists that we should download 10 MB of objects from multiple domains... even just to read 5-10 paragraphs of text
Links 22/12/2025: Facebook "Testing $14.99 Monthly Subscription Fee to Post Links" and "Middle East Petrostates as American Media Owners"
Links for the day
Beyond the World Wide Web (WWW)
We continue to treat Gemini Protocol as a first-class citizen
Serbia: GNU/Linux Rises, Windows Down to All-Time Lows
According to statCounter
"Wrestling With Pigs"
"Never wrestle with a pig. You both get dirty, and the pig likes it."
Productive Year and Better Access to Techrights' Archives Going Back to 2006
we've long needed and wanted native, local, independent search facilities
Linux Abandoned by Linux Foundation
It speaks for Microsoft and for so-called 'AI' companies
Microsoft Has Practically Given Up on XBox Already
Expect many XBox related layoffs when 2026 starts (Q1)
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Sunday, December 21, 2025
IRC logs for Sunday, December 21, 2025
"Today's [Red Hat] is run by a cabal of vultures."
it seems safe to assume Red Hat too will languish away
Microsoft Layoffs in 2026 Can be Bigger Than 2025 Microsoft Layoffs (30,000+ Workers Laid Off)
"Is there going to be any reorg or Microsoft layoffs?"
Gemini Links 21/12/2025: Solstice, Chaos of CSS, and Program Interpreter Fun
Links for the day
The Free Software Foundation (FSF) Represents People, Not Corporations
FSF isn't in the "business" of appeasing oligarchs
Why?
Why write articles?
Microsoft-Connected Publisher Spinning XBox's Death Spiral (It's Dying Fast) as a Strength and Something Deliberate
"Microsoft’s big gaming pivot"
Slop is Rare by Now
A year ago slop was so abundant that we did a whole series about it, and it was daily
Links 21/12/2025: U.S. Strikes in Syria, "Epstein Files Photos Disappear From Government Website"
Links for the day
Gemini Links 21/12/2025: Labrador Retriever of Lagrange's Developer Dies From Cancer, Political Philosophy, and "Getting to Inbox Zero"
Links for the day
IBM: We Can't Make 'AI' (Voice Recognition) Do the Work of a McDonald's Teenager, So Let's Try the Same on Saudi Planes
IBM is lost. It's truly lost.
Microsoft is Becoming Irrelevant: The Case of Georgia
Not Georgia Tech
Sirius Open Source is Now Imminently Dead (Struck Off)
compulsory strike-off
Dr. Richard Stallman, Invited by LibreTech Collective, is Giving a Public Talk in Georgia Tech Next Month (Scheller College of Business)
They can probably squeeze about 400 people into this room
25 Years of Activism for GNU/Linux
My passion for GNU/Linux brought a lot of contentment
Africa, Where Microsoft Used De Facto Slaves to Pretend to be "AI", Chatbots Usage is 0.2% of Measured Online Traffic
Judging by recent trends in Africa, many "Windows PCs" are being converted into GNU/Linux computers
New Drone Footage Shows IBM is Dead (Parts of It)
The people who participated in IBM when IBM actually mattered probably have boasting rights, unlike people who work for IBM today
Michael Larabel Adds Slop Category to Phoronix, Quickly Realises That It's Worthless
Phoronix nowadays gets carried away; it made a new category to talk about slop and it decided to call it "intelligence" with some caricature of a brain (that's misleading)Phoronix nowadays gets carried away; it made a new category to talk about slop and it decided to call it "intelligence" with some caricature of a brain (that's misleading)
After 35 Years the World Wide Web, HTML, and HTTP Are Proprietary
HTTP/2 added a lot of complexity (it's just a Google protocol, based on SPDY originally), many image formats are proprietary and patented, HTML got 'replaced' by Java-Scripts [sic], and many URLs (the URL system was created in the early 90s) are just long strings for proprietary 'webapps'
The General Public License (GPL) Inspired the Web's Original Openness/Freedom, According to Tim Berners-Lee
"During the preceding year I had been trying to get CERN to release the intellectual property rights to the Web code under the General Public License (GPL) so that others could use it."
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Saturday, December 20, 2025
IRC logs for Saturday, December 20, 2025