Bonum Certa Men Certa

The LLM Ouroboros Phenomenon

posted by Roy Schestowitz on May 19, 2025,
updated May 19, 2025

An ouroboros in a 1478 drawing in an alchemical tract

Ancient Greek mythology came up with this concept of an ouroboros, wherein some animal - typically a snake for a feasible "IRL" (in real life) metaphor - eats itself by eating its own tail. We would not be the first to point out the analogy here for LLMs because an ouroboros is a good parable. This morning we catalogued two BSD and Linux sites complaining about desperate LLM scrapers staging a DDoS attack in pursuit of original, as in human-written, code or words. This isn't a new problem for us and in the past few days we served about half a million pages in Gemini Protocol, likely due to to LLM scrapers. It's obnoxious to say the least, but distinguishing benign from malicious (or worthless junk) requests is hard and a "moving target" (it's never enough as parasites learn to adapt).

This morning in IRC we made an assertion about LLMs and fake (slop) images. We also made several observations. Fact #1: over time slop gets worse (training set is like some blurry JPEG). Fact #2: People's "smell" for slop improves over time, as they 'train' on slop and can detect it based on prior encounters. Put 1 and 2 together.

Are LLMs bound to not only get worse but also more easily detectable by an increasingly sceptical general public? TheLayoff.com has just responded to this.

An associate opines that fact #1 (that slop gets worse over time) is exacerbated by the flood of slop on the Net being snarfed up by newer bots and mistaken for training data. "Thus the feedback loop I mentioned a long time back and which Andy wrote about in depth." (He was referring to Dr. Farnell's good writings about this dilemma - as he did several times in The CyberShow's blog)

To a certain extent my Ph.D. thesis (dissertation) covered this about two decades ago. The associate says that it's a "well-known problem from days of old".

There are several unique aspects to this, including validation bias. To me it seemed a bit related to but not the same as over-training because, as an associate explains, "overtraining is something else: too much data and the patterns become locked too tightly to the training set and less useful for new data".

For an LLM to scan online its own output serves to affirm the mistakes, or the errors, often euphemised as mere "hallucinations", which are innocent, not libellous, and by no means "intentional" and "harmful". Dr. Farnell and Dr. Kate Brown responded to this last October in "Radical disbelief and its causes".

In the context of my thesis (dissertation), a concern was raised about what we back then called "synthetic data" finding its way "back" into the training set. So when you check brain MRI scans (which is what we did back then) you must ensure you only ever deal with real data, not mock or manipulated data that can confirm your own biases and "fit into" the model that generated it in the first place (in generative mode). To use the analogy of text-based LLMs, your BS is "truth" if your input is your own BS (output/s) and it would be deemed accurate, based on you (opposite of the notion of peer review in science). The associate correctly points out, based on a scan of my thesis (dissertation), that the strings "overtraining" and "over-training" are not in the dissertation, but we used different terms back then.

A squat toilet (also known as an Eastern, Turkish, Iranian or Natural-Position toilet). This one is in Turkey

"An LLM Ouroboros of shit", as the associate dubs it, would be statistical models (such as PDMs or AAMs*) treating computer-generated images as something from "the real world".

The so-called "generative hey hi" (genAI) "bros" won't allow the media to talk about such issues, at least if they can downplay the issues and deny/misportray them (in the media). But it's a real and growing problem. Its magnitude likely grows quadratically, not linearly. Just like other bubbles (overabundance based around hype), don't expect linear implosions. When it's gone (poof!), it's gone.

____

* PDM and AAM need expansion in the explanatory sense, not just words (in the acronyms). PDMs go back several decades ago they were invented or pioneered by the people who tutored me. They use mathematical, statistical models to perform multidimensional analysis of data variations, based upon principal component analysis (PCA). AAMs are an extension but with textures, not only points. This is really old stuff; even AAMs are over 23 years old; now the mainstream media pretends those are some kind of "revolution".

Other Recent Techrights' Posts

Links 25/07/2025: NOAA Cuts Endanger Lives, "Europe's Self Inflicted Cloud Crisis"
Links for the day
YouTube is a Spamfarm, Slopfarm, and Clickfarm (a Lot of Numbers There Are Fake)
Those who don't fake look unpopular and unimportant
 
Links 26/07/2025: Rationed Meals in the US and TikTok Repels Investments (Too Toxic)
Links for the day
Gemini Links 26/07/2025: "Bloody Google" and New People in Geminispace
Links for the day
Response to Solderpunk (Father of Gemini Protocol) About the Gemini Community
Solderpunk responds to non-sequitur
HTML and the Web Used to be Something a Child Could Learn, "Modern" Web is a Puzzle of Frameworks, Bloat, and Worse
When the Web was more like Gemini Protocol
New US Editor in The Register is 84% Microsoft/Windows Booster
It'll be worrying if it carries on like this
Links 25/07/2025: Slop Blunders and China Has Code of Conduct for Lawmakers in HK
Links for the day
Gemini Links 25/07/2025: Some Books and Babies and Capital
Links for the day
They Try to Lecture Us on Ethics
They even removed "master" from Microsoft GitHub
The Future of the Web is One Rendering Engine or 'Flavours' of Chrome
The future of the Web does not look bright at all
Best Sites Are Not Optimised for Any Browser, They Work Equally Well With All of Them
Red Hat (IBM) is making rubbish sites
We Don't Do JavaScript and Pages Are Small
Thankfully Gemini Protocol has nothing like JavaScript
'Tech' is Not Technology
Some people use terms like 'Old Tech'
IBM's Debt Rose by Almost 10 Billion Dollars in the Past 6 Months Alone
The "hey hi" circus is coming to an end
Yes, Master
Gaslighting by actual racists
Microsoft Bribes and Buys Politicians to Tell Europe What to Do About Free Software (Which It's Attacking)
Microsoft: we speak for the thing that we are attacking! Follow the money...
Making Backups Quickly and Reliably
Backups are imperative, more so in an age of uncertainty, unpredictable weather, and worsening standards (quality of products going down while prices go up)
Techrights Investigation: Estimating the Point in Time LinuxIac Turned Into LLM Slop (Part of the Time)
Bobby Borisov got lazy
10th Month, Ten Weeks From Now, at Ten AM
In Wentworth Institute of Technology in Boston
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Thursday, July 24, 2025
IRC logs for Thursday, July 24, 2025
A Nadella Memo Distracts From Microsoft's Cheapening Of the Workforce
Right now the "MSM" (mainstream media) is flooded/overwhelmed by garbage pieces that relay lies for Nadella
Vanishing Faces of GNU/Linux
Free software projects do not depend on any one person or company to still exist
Microsoft Says It Lost 400 Million Windows Users, Now It's Waiting for GNU/Linux to Stop Booting on 'Old' PCs
When it comes to Windows, Microsoft is fully aware of the issue and statements it made earlier this summer suggest it lost 400 million Windows users
Slopwatch: LinuxTechLab, linuxsecurity.com, LinuxIac, and More
Also: The Register's Microsoft agenda (new editor)
Gemini Links 25/07/2025: Gemtext Aware Titan Editor and Gemini Protocol Comeback
Links for the day
Links 24/07/2025: Convicted Felon Quits UNESCO, "Vibe Coding Goes Wrong", and Signalgate Gets Worse
Links for the day
Gemini Links 24/07/2025: Forgejo Woes and Smolnet Directory Week
Links for the day
Misinformation is Not Intelligence
It's low-grade plagiarism and it fails to show any signs of intelligence
Links 24/07/2025: Storage Tapes Still Kicking, Windows TCO 'on Steroids' (Microsoft-Induced Catastrophes)
Links for the day
Bobby Borisov (LinuxIac) Has Apparently Begun Experimenting With LLM Slop, So We Cannot Trust LinuxIac Anymore
So did LinuxIac become a slopfarm? Maybe not yet, but it's getting there
Informa TechTarget's ITProToday is Becoming a Slopfarm Generated by Microsoft Chatbots
Busted.
'Tech' Gimmicks Are for Advertising, Not for Usability
In the case of Microsoft, they latched onto slop
BetaNews Sacked Brian Fagioli and Deleted His Comments, But He Still Tries to Use the "BetaNews" Brand for Self-Affirmation
Fagioli takes the work of other people
[Meme] Hard to Be a Better Person?
Sooner or later they'll realise that for each pound I spend they need to spend about 1,000 times more
The LLM Con Artists Are Highly Destructive
Who will ever be held accountable for this scam?
Too Bribed by Microsoft to Move to Free Software?
Microsoft lies and Microsoft bribery (in politics)
New US Editor for The Register is a Microsoft Booster
"Avram Piltch has served as US editor for The Register since July 2025."
Microsoft Hiring European Politicians is Another Form of Bribery; There Should be a European Investigation
When Microsoft bribed people in Europe for OOXML (there's no denying this!) a European government delegate said that Microsoft operated like a cult
Reda Demanded That FSF Removes Its Founder, Now Reda Works Directly for Microsoft
A sellout and a traitor, first working for GAFAM, now Microsoft
PCLinuxOS is Raising Money to Support Development After Fire Incident at the Host
PCLinuxOS has not had announcements lately
Speed of the Site Should be Better Now
The "bot attacks" impact the speed of the sister site too
Getting More From AnalogNowhere
Recently we used many images from AnalogNowhere
Microsoft, Microsofters and 'Secure' Boot Shills Already Storming the LWN Report About Expiring Certificate, Shooting the Messenger
LWN has clearly stuck a nerve
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Wednesday, July 23, 2025
IRC logs for Wednesday, July 23, 2025
Disable "Secure" Boot Today (the Only Better Time to Do So Was Yesterday)
Don't trust anything Red Hat tells you about security
Links 23/07/2025: Windows Killed Company After 150+ Years, US Government Mimics Russia's Attacks on the Media
Links for the day
Freedom Generally Wins at the End, History Shows (But It's Constantly Attacked, Too)
At the moment people realise "Linux" (e.g. Android) isn't enough to guarantee any freedoms
Over 3 Months Later Brett Wilson LLP Still Unable to Recruit a Media Lawyer?
"Immediate start", but not found... still unfilled