Bonum Certa Men Certa

The LLM Ouroboros Phenomenon

posted by Roy Schestowitz on May 19, 2025,
updated May 19, 2025

An ouroboros in a 1478 drawing in an alchemical tract

Ancient Greek mythology came up with this concept of an ouroboros, wherein some animal - typically a snake for a feasible "IRL" (in real life) metaphor - eats itself by eating its own tail. We would not be the first to point out the analogy here for LLMs because an ouroboros is a good parable. This morning we catalogued two BSD and Linux sites complaining about desperate LLM scrapers staging a DDoS attack in pursuit of original, as in human-written, code or words. This isn't a new problem for us and in the past few days we served about half a million pages in Gemini Protocol, likely due to to LLM scrapers. It's obnoxious to say the least, but distinguishing benign from malicious (or worthless junk) requests is hard and a "moving target" (it's never enough as parasites learn to adapt).

This morning in IRC we made an assertion about LLMs and fake (slop) images. We also made several observations. Fact #1: over time slop gets worse (training set is like some blurry JPEG). Fact #2: People's "smell" for slop improves over time, as they 'train' on slop and can detect it based on prior encounters. Put 1 and 2 together.

Are LLMs bound to not only get worse but also more easily detectable by an increasingly sceptical general public? TheLayoff.com has just responded to this.

An associate opines that fact #1 (that slop gets worse over time) is exacerbated by the flood of slop on the Net being snarfed up by newer bots and mistaken for training data. "Thus the feedback loop I mentioned a long time back and which Andy wrote about in depth." (He was referring to Dr. Farnell's good writings about this dilemma - as he did several times in The CyberShow's blog)

To a certain extent my Ph.D. thesis (dissertation) covered this about two decades ago. The associate says that it's a "well-known problem from days of old".

There are several unique aspects to this, including validation bias. To me it seemed a bit related to but not the same as over-training because, as an associate explains, "overtraining is something else: too much data and the patterns become locked too tightly to the training set and less useful for new data".

For an LLM to scan online its own output serves to affirm the mistakes, or the errors, often euphemised as mere "hallucinations", which are innocent, not libellous, and by no means "intentional" and "harmful". Dr. Farnell and Dr. Kate Brown responded to this last October in "Radical disbelief and its causes".

In the context of my thesis (dissertation), a concern was raised about what we back then called "synthetic data" finding its way "back" into the training set. So when you check brain MRI scans (which is what we did back then) you must ensure you only ever deal with real data, not mock or manipulated data that can confirm your own biases and "fit into" the model that generated it in the first place (in generative mode). To use the analogy of text-based LLMs, your BS is "truth" if your input is your own BS (output/s) and it would be deemed accurate, based on you (opposite of the notion of peer review in science). The associate correctly points out, based on a scan of my thesis (dissertation), that the strings "overtraining" and "over-training" are not in the dissertation, but we used different terms back then.

A squat toilet (also known as an Eastern, Turkish, Iranian or Natural-Position toilet). This one is in Turkey

"An LLM Ouroboros of shit", as the associate dubs it, would be statistical models (such as PDMs or AAMs*) treating computer-generated images as something from "the real world".

The so-called "generative hey hi" (genAI) "bros" won't allow the media to talk about such issues, at least if they can downplay the issues and deny/misportray them (in the media). But it's a real and growing problem. Its magnitude likely grows quadratically, not linearly. Just like other bubbles (overabundance based around hype), don't expect linear implosions. When it's gone (poof!), it's gone.

____

* PDM and AAM need expansion in the explanatory sense, not just words (in the acronyms). PDMs go back several decades ago they were invented or pioneered by the people who tutored me. They use mathematical, statistical models to perform multidimensional analysis of data variations, based upon principal component analysis (PCA). AAMs are an extension but with textures, not only points. This is really old stuff; even AAMs are over 23 years old; now the mainstream media pretends those are some kind of "revolution".

Other Recent Techrights' Posts

European Patent Office (EPO) Series: Battistelli's "Baltic Crusader"
Gilles Requena, Battistelli's erstwhile "Baltic Crusader" and the loyal servant of his successor Campinos
 
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Sunday, June 14, 2026
IRC logs for Sunday, June 14, 2026
Links 14/06/2026: Energy Cost and Reality Strikes at Heart of Slop Bubble, 75 Data Center Build-outs "Successfully Blocked"
Links for the day
Microsoft CEO Says XBox is Not a Sustainable Business
"Now, we have to turn this into a sustainable business," he said about XBox
MElon (MUSK, Elon) is a Trillionaire Like Penguins Are Mammals
Have media outlets told the truth?
Unlikely Heroes
One personal hero who is not alive (anymore) is Navalny
Bruce Schneier Was Probably Wrong About Slop
Right now politicians who openly speak in favour of slop are committing "political suicide"
SLAPP Censorship - Part 106 Out of 200: 100 Kilograms of Legal Papers
When one party's communications and filings weigh at about 3 KG of paper and another's... at about 100 KG of paper
Links 14/06/2026: More Google Layoffs, Wall Street Deems Companies That Lose Money "Worth" Trillions
Links for the day
Gemini Links 14/06/2026: "The Universe is a Hologram", "Matrix Brain Download", and "Happy 0th Year"
Links for the day
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Saturday, June 13, 2026
IRC logs for Saturday, June 13, 2026
Links 13/06/2026: University of Nottingham Confirms Data/System Breach, Courts Fuming at Fraudulent Lawyers Who Fling LLM Slop at Them
Links for the day
Gemini Links 13/06/2026: World Cups and 做人
Links for the day
Microsoft's XBox "Bloodbath" Seems to Have Already Begun (Informally), Studios Allegedly to Face Shutdowns, Layoff Notices Handed Out, 100% Layoffs in Some Cases, 10% in Others or on Average
So is a complete closure/shutdown imminent? (Compulsion Games in this case)
Discussing Morale at IBM and Conversations Regarding IBM Layoffs (Disguised as Other Things)
Trolling can be a form of censorship
European Patent Office (EPO) Series: All the President's Men
Gilles Requena,Patrice Pellegrino, and Sandro Mendonça
SUEPO Elections Coming Up, Union Leaders at Europe's Second-Largest Institution (EPO) to be Determined Soon
The staff union of the European Patent Office (SUEPO) is having an election soon
SLAPP Censorship - Part 105 Out of 200: When Bad Legal Advice Results in Your Client, Dale Vince, Ordered to Pay £600k - or 801,930 United States Dollar (USD) - to the Person Frivolously Sued (Lord Bailey of Paddington)
"A judge has ruled that Dale Vince must pay punitive costs to Lord Bailey of Paddington, the Tory peer, over the 'unexplained abandonment' of his" SLAPP
How Long for Can American Taxpayers Justify Bailing Out Microsoft?
How many times need the American taxpayers give Microsoft money for vapourware that's neither necessary nor delivered?
IBM is Importing/Exporting Corporations' Regime of Censorship (Hiding the Wrongdoing) to Free Software Communities
Is IBM protecting criminals in the name of "manners"?
Links 13/06/2026: Microsoft’s XBox Crisis and "Apple Deepfakes"
Links for the day
Gemini Links 13/06/2026: Why Humans Are Mostly Right Handed and "Getting Things Done"
Links for the day
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Friday, June 12, 2026
IRC logs for Friday, June 12, 2026