Bonum Certa Men Certa

The LLM Ouroboros Phenomenon

posted by Roy Schestowitz on May 19, 2025,
updated May 19, 2025

An ouroboros in a 1478 drawing in an alchemical tract

Ancient Greek mythology came up with this concept of an ouroboros, wherein some animal - typically a snake for a feasible "IRL" (in real life) metaphor - eats itself by eating its own tail. We would not be the first to point out the analogy here for LLMs because an ouroboros is a good parable. This morning we catalogued two BSD and Linux sites complaining about desperate LLM scrapers staging a DDoS attack in pursuit of original, as in human-written, code or words. This isn't a new problem for us and in the past few days we served about half a million pages in Gemini Protocol, likely due to to LLM scrapers. It's obnoxious to say the least, but distinguishing benign from malicious (or worthless junk) requests is hard and a "moving target" (it's never enough as parasites learn to adapt).

This morning in IRC we made an assertion about LLMs and fake (slop) images. We also made several observations. Fact #1: over time slop gets worse (training set is like some blurry JPEG). Fact #2: People's "smell" for slop improves over time, as they 'train' on slop and can detect it based on prior encounters. Put 1 and 2 together.

Are LLMs bound to not only get worse but also more easily detectable by an increasingly sceptical general public? TheLayoff.com has just responded to this.

An associate opines that fact #1 (that slop gets worse over time) is exacerbated by the flood of slop on the Net being snarfed up by newer bots and mistaken for training data. "Thus the feedback loop I mentioned a long time back and which Andy wrote about in depth." (He was referring to Dr. Farnell's good writings about this dilemma - as he did several times in The CyberShow's blog)

To a certain extent my Ph.D. thesis (dissertation) covered this about two decades ago. The associate says that it's a "well-known problem from days of old".

There are several unique aspects to this, including validation bias. To me it seemed a bit related to but not the same as over-training because, as an associate explains, "overtraining is something else: too much data and the patterns become locked too tightly to the training set and less useful for new data".

For an LLM to scan online its own output serves to affirm the mistakes, or the errors, often euphemised as mere "hallucinations", which are innocent, not libellous, and by no means "intentional" and "harmful". Dr. Farnell and Dr. Kate Brown responded to this last October in "Radical disbelief and its causes".

In the context of my thesis (dissertation), a concern was raised about what we back then called "synthetic data" finding its way "back" into the training set. So when you check brain MRI scans (which is what we did back then) you must ensure you only ever deal with real data, not mock or manipulated data that can confirm your own biases and "fit into" the model that generated it in the first place (in generative mode). To use the analogy of text-based LLMs, your BS is "truth" if your input is your own BS (output/s) and it would be deemed accurate, based on you (opposite of the notion of peer review in science). The associate correctly points out, based on a scan of my thesis (dissertation), that the strings "overtraining" and "over-training" are not in the dissertation, but we used different terms back then.

A squat toilet (also known as an Eastern, Turkish, Iranian or Natural-Position toilet). This one is in Turkey

"An LLM Ouroboros of shit", as the associate dubs it, would be statistical models (such as PDMs or AAMs*) treating computer-generated images as something from "the real world".

The so-called "generative hey hi" (genAI) "bros" won't allow the media to talk about such issues, at least if they can downplay the issues and deny/misportray them (in the media). But it's a real and growing problem. Its magnitude likely grows quadratically, not linearly. Just like other bubbles (overabundance based around hype), don't expect linear implosions. When it's gone (poof!), it's gone.

____

* PDM and AAM need expansion in the explanatory sense, not just words (in the acronyms). PDMs go back several decades ago they were invented or pioneered by the people who tutored me. They use mathematical, statistical models to perform multidimensional analysis of data variations, based upon principal component analysis (PCA). AAMs are an extension but with textures, not only points. This is really old stuff; even AAMs are over 23 years old; now the mainstream media pretends those are some kind of "revolution".

Other Recent Techrights' Posts

Richard Stallman About to Give More Talks in Europe, Some Confirmed Already
In Göteborg
Justice for Wildlife
animals cannot speak to humans who hate animals
GNU Was Right 42+ Years Ago
Since then the abusive, user-hostile technology has spread like mushrooms
Almost Half of the FSFE's Money (the Fake 'FSF', Misusing the Brand) Comes From Vodafone
That money always comes with strings, even if they're invisible to most of us
 
Links 30/09/2025: Death Sentences, Internet Censorship, and Internet Shutdowns
Links for the day
Gemini Links 30/09/2025: Social Control Media and ROOPHLOCH
Links for the day
Links 30/09/2025: CERN in "Have I Been Pwned" and More Windows TCO Blunders
Links for the day
Microsoft Canonical is Selling Mass Surveillance and Back Doors as "Security for Ubuntu"
If you are looking for a GNU/Linux distro to use, just remember that Microsoft has Ubuntu in the bag
Cowboys Gonna Be Cowboys (on the Internet, They're Not a New Problem)
Boys will be boys
Cowboys of the "Left" and Cowboys of the "Right"
Don't believe the lie that this is some "leftist" thing
When Codes of Conduct Serve to Protect Criminals From Much-Deserved Scrutiny
CoCs are typically unfit for purpose because enforcement lacks context and suitable understanding of the full background (the "full story")
It Took the Open Source Initiative (OSI) 4+ Years to Address the 'Data Breach' or Data Protection Violation Reported to the California Privacy Protection Agency (CPPA) in March 2025
We may never know the dialogue or its nature
Even Microsoft's Biggest Boosters (and Media Operatives) Are Turning Against Microsoft
Expect many more layoffs before the fake "results" next month
Old Isn't Always Inadequate
How many gadgets manufactured today (in 2025) will still work in 2075?
The Monkey Business of Rust People
Compatibility won't matter
Microsoft Lunduke Spreads Deliberate Lies to Incite Online Mobs
Has he lost his reading comprehension skills?
Our 19th Birthday (in Just Over 5 Weeks From Now)
We meanwhile have ongoing, solid plans to cover patent-related issues when the FSF turns 40
British GNU/Linux Distro FydeOS Tops DistroWatch
That seems like a decent site and decent effort to keep an eye on
We'll Soon Have 75,000 GemText Pages
avoid many perils of today's Web
Google Used Free Software to Build a Monopoly. Now Google Kicks Free Software to the Curb
The "G" in "Google" does not stand for GNU. It never did. It's just another greedy company.
Gemini Links 30/09/2025: Retro Hardware, Federated Fragmentation, and Nex Server Written in C
Links for the day
4 More Days Till "4 decades, 4 freedoms, 4 all users"
We are now just 4 days away from the rare anniversary
Two Months After Merging to Hide GitHub Losses Microsoft is Doing It Again (This Time Windows)
Merging those two together is not a sign of strength but a tightening of budget
Speculations About the Next Large Wave of IBM/Red Hat Layoffs
the mass layoffs are likely to happen on week 3 or 4 in October
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Monday, September 29, 2025
IRC logs for Monday, September 29, 2025
Links 29/09/2025: Opposition to Surveillance Giant Google and Conflicts Worldwide (Moldova Sides With EU)
Links for the day
Why the EPO Never Managed to Silence Us (After Over a Decade of Trying)
Firms like Mishcon de Reya and Brett Wilson LLP contribute to a bad stigma, staining the entire occupation
Links 29/09/2025: Datacenter Fires and "Too Much Internet Use Is Changing Teenage Brains"
Links for the day
Almost a Couple of Years After Microsoft Hijacked the Name 'Sudo' (to Describe Unrelated Windows Stuff) Microsoft Canonical Breaks Sudo in Ubuntu
These are vandals in "goodwill" or "security" clothing
Does the Good Law Project (GLP) Know the Director of Brett Wilson LLP Deems It OK to Endorse Violent Actions Against Trans People?
We were miffed to see this morning's report
Names Are Not Unique IDs and the UK Government's "Digital ID System" Would be a Nightmare
Digital surveillance, "apps", and worse (all the time)
What is Roy and Rianne's Righteously Royalty-free RSS Reader?
A news reader that uses OPML files and parses RSS feeds
The Free Software Foundation (FSF) Turns 40 in 5 Days
We should be talking about software freedom, not "Open Source"
It Feels Like Brett Wilson LLP Has Just Tacitly Admitted That It Defamed Me
It arguably admitted many other things by refusing to deny or address them (altogether)
Stefano Maffulli's Front Page Mentions "AI" 11 Times
They're more focused on slop (plagiarism) than sharing or Software Freedom
CMS Rot
With "modern" (bloated) content management systems (CMSs) there is a long chain of dependencies
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Sunday, September 28, 2025
IRC logs for Sunday, September 28, 2025
Slopwatch: Fake Articles About Linux 6.17 and Microsoft Meddling in Linux Development
today's Slopwatch is short because the picks are from Sunday
Gemini Links 29/09/2025: The Labor Wars and Retro
Links for the day
Links 28/09/2025: Windows TCO, Security Breaches, and Deutsche Bahn Woes
Links for the day
Datacentres Aren't Reliable for Backups
bad practices cause immeasurable levels of permanent data losses each and every day
Links 28/09/2025: Science, Censorship, and Security Incidents/Advisories
Links for the day
Gemini Links 28/09/2025: Golem and Cybertrucks
Links for the day
Links 28/09/2025: Moldova Elections, LLM Slop Failing Again to Accomplish Anything
Links for the day
Links 28/09/2025: Slop Does More Harm, Newly Released Epstein Estate Documents
Links for the day
Links 28/09/2025: Fentanylware (TikTok) 'Going Private' (the Dictator's Media Allies) and UK Mirror Lays Off More Journalists
Links for the day
A Year Ago, Only a Few Weeks After We Countersued the 'Hulk Hogan of UEFI', Our Webhost Came Under Attack
At the end of September 2024 our webhost received several threats
If Only Someone Warned Us About This...
Ubuntu is committing suicide with Rusty code
The Register - Kissing the hand that feeds it
hired to manage the publication several people connected to Microsoft, including the new Editor in Chief
The Myths of "Linux" and of "Intelligence"
As noted this morning
People Remembered GNU's Birthday (Which Helps Remind People It All Started in 1983, Not 1991)
Have the FSF and GNU earned the respect they deserve?
Slopwatch: Ponzi Schemes Promoted by Media Companies, Linux Journal Turning Its 30-Year Reputation to Dust, and Serial Slopper Brian Fagioli Plagiarising, As Usual
This bubble will end up very badly
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Saturday, September 27, 2025
IRC logs for Saturday, September 27, 2025