Bonum Certa Men Certa

The LLM Ouroboros Phenomenon

posted by Roy Schestowitz on May 19, 2025,
updated May 19, 2025

An ouroboros in a 1478 drawing in an alchemical tract

Ancient Greek mythology came up with this concept of an ouroboros, wherein some animal - typically a snake for a feasible "IRL" (in real life) metaphor - eats itself by eating its own tail. We would not be the first to point out the analogy here for LLMs because an ouroboros is a good parable. This morning we catalogued two BSD and Linux sites complaining about desperate LLM scrapers staging a DDoS attack in pursuit of original, as in human-written, code or words. This isn't a new problem for us and in the past few days we served about half a million pages in Gemini Protocol, likely due to to LLM scrapers. It's obnoxious to say the least, but distinguishing benign from malicious (or worthless junk) requests is hard and a "moving target" (it's never enough as parasites learn to adapt).

This morning in IRC we made an assertion about LLMs and fake (slop) images. We also made several observations. Fact #1: over time slop gets worse (training set is like some blurry JPEG). Fact #2: People's "smell" for slop improves over time, as they 'train' on slop and can detect it based on prior encounters. Put 1 and 2 together.

Are LLMs bound to not only get worse but also more easily detectable by an increasingly sceptical general public? TheLayoff.com has just responded to this.

An associate opines that fact #1 (that slop gets worse over time) is exacerbated by the flood of slop on the Net being snarfed up by newer bots and mistaken for training data. "Thus the feedback loop I mentioned a long time back and which Andy wrote about in depth." (He was referring to Dr. Farnell's good writings about this dilemma - as he did several times in The CyberShow's blog)

To a certain extent my Ph.D. thesis (dissertation) covered this about two decades ago. The associate says that it's a "well-known problem from days of old".

There are several unique aspects to this, including validation bias. To me it seemed a bit related to but not the same as over-training because, as an associate explains, "overtraining is something else: too much data and the patterns become locked too tightly to the training set and less useful for new data".

For an LLM to scan online its own output serves to affirm the mistakes, or the errors, often euphemised as mere "hallucinations", which are innocent, not libellous, and by no means "intentional" and "harmful". Dr. Farnell and Dr. Kate Brown responded to this last October in "Radical disbelief and its causes".

In the context of my thesis (dissertation), a concern was raised about what we back then called "synthetic data" finding its way "back" into the training set. So when you check brain MRI scans (which is what we did back then) you must ensure you only ever deal with real data, not mock or manipulated data that can confirm your own biases and "fit into" the model that generated it in the first place (in generative mode). To use the analogy of text-based LLMs, your BS is "truth" if your input is your own BS (output/s) and it would be deemed accurate, based on you (opposite of the notion of peer review in science). The associate correctly points out, based on a scan of my thesis (dissertation), that the strings "overtraining" and "over-training" are not in the dissertation, but we used different terms back then.

A squat toilet (also known as an Eastern, Turkish, Iranian or Natural-Position toilet). This one is in Turkey

"An LLM Ouroboros of shit", as the associate dubs it, would be statistical models (such as PDMs or AAMs*) treating computer-generated images as something from "the real world".

The so-called "generative hey hi" (genAI) "bros" won't allow the media to talk about such issues, at least if they can downplay the issues and deny/misportray them (in the media). But it's a real and growing problem. Its magnitude likely grows quadratically, not linearly. Just like other bubbles (overabundance based around hype), don't expect linear implosions. When it's gone (poof!), it's gone.

____

* PDM and AAM need expansion in the explanatory sense, not just words (in the acronyms). PDMs go back several decades ago they were invented or pioneered by the people who tutored me. They use mathematical, statistical models to perform multidimensional analysis of data variations, based upon principal component analysis (PCA). AAMs are an extension but with textures, not only points. This is really old stuff; even AAMs are over 23 years old; now the mainstream media pretends those are some kind of "revolution".

Other Recent Techrights' Posts

Links 26/10/2025: Microsoft Spies on Gamers, Open Transport Community Conference
Links for the day
Links 26/10/2025: LLM Slop / Plagiarism Programs Continue to Disappoint, CISA Layoffs Threaten Systems
Links for the day
Gemini Links 26/10/2025: Gemsync and Joining the Small Web
Links for the day
India.com a Click-baiting, SEO-Spamming, Slopfarming Heap
They do this almost every day
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Saturday, October 25, 2025
IRC logs for Saturday, October 25, 2025
Without XBox Consoles, XBox is No More, It's Just a Brand (More Rumours of Microsoft Ending XBox, Then Laying Off Lots of Staff)
All signs indicate that Microsoft wants to "exit" the XBox business (not brand), but it does not want to publicly admit this as it would alarm staff and shareholders
Gemini Links 25/10/2025: Portugal, Midnightpub, and "Tech Right Admins"
Links for the day
Almost 2026 Already (When We Turn Twenty)
In just over a year the site will turn 20
When "Sponsored Feature" in The Register MS Means Ponzi Scheme Promotion From the Communist Party of China (CPC)
the promotion of a financial scam
Week of EPO Leaks: Workers of the EPO Are Getting a Pay Cut While Prices Rise Fast
More to come in the next few days
Microsoft is Finally Giving Up on XBox, The Chief Says the Grapes Are Sour Anyway
Microsoft loses hundreds of dollars on each XBox that it sells
Slopwatch: LinuxSecurity, UbuntuPIT, and Various Slopfarms Propped up by Google News
Why can't Google News do better than this?
Links 25/10/2025: Two New Smokescreens for Scam Altman and ‘TikTok USA’ Remains in Limbo
Links for the day
Bad faith: can't change Debian Social Contract (DSC) without unanimous consent of every joint author
Reprinted with permission from Daniel Pocock
Confirmed: Very Close Friend of Bill Gates and Microsoft's Biggest Patent Troll Nathan Myhrvold Flew the Lolita Express (a Gateway to Pedophilia), According to Bill Gates-Sponsored Seattle Times
There is no speculation or any "conspiracy theories" here;' those are verified facts
Gemini Links 25/10/2025: "The Highest Leader of The Global Civil Society Community", SSL Certificates Causing Bitrot
Links for the day
Links 25/10/2025: Target Layoffs and "Shutdown Sparks 85% Increase in US Government Cyberattacks"
Links for the day
"Big Data" Was a Big Lie
Remember "Big Data"? Remember "Data Scientists"...?
statCounter Has Been Broken for a Long Time
Considering the huge proportion of Web requests that come from LLM bots (more so this past year or two), statCounter may struggle to justify the operating costs
Techrights Anniversary Party on November 7th
Let us know if you need any accommodation-related arrangements
Trends That Must Alarm Microsoft and Mozilla
Expect Firefox to no longer be supported by various sites in the US
Why Microsoft Became the Layoffs Leader
The corporate media is projecting or signalling its own dishonesty when it tells us that Microsoft is a very "valuable" company while the data shows Microsoft is also a "market leader" in layoffs
Speaking for Ourselves and Letting the Facts Speak for Themselves
we've already published over 50,000 pages
For Second Time in a Day The Register MS Takes Money From Private Companies to Sell a Ponzi Scheme
Do not have empathy for those who have zero empathy towards you
IBM is Misleading IBM Shareholders
IBM is still all about vapourware and buzzwords
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Friday, October 24, 2025
IRC logs for Friday, October 24, 2025
The Serial Slopper Starts Up - or Restarts - His Plagiarism Machine (LLMs)
Serial Sloppers like these don't belong in news sites. That's why he got sacked by BetaNews.
Links 24/10/2025: Esperanto Music History, Anxiety, and New Portals
Links for the day
[Video] Richard Stallman's Talk in Sweden, Attended by Nearly 700 People, is Now Online
The Web page is in Swedish, but the talk is in English
Slopwatch: LinuxSecurity.com, Linux Journal, and Pet Slopfarms of Google News
Why does Google News still advance these fake sites to the top of search results?
Links 24/10/2025: Inequality Grows, Billion-Dollar Scam Center Industry
Links for the day
Links 24/10/2025: "Independent Media in Cambodia is Collapsing" and Serious F5 Breach
Links for the day
Coping With the Site Going More Mainstream
Fame is no laughing matter
They Never 'Put Down' Corporations
There are "pests" that are traded in Wall Street
21 Pages in Less Than 7 Hours is No Joking Matter
We've become a lot more effective and efficient
Correct Information is a Valued Asset in the Age of Slopfarms and Public Relations (PR) or Spin
Publishing suppressed facts is never easy
The Register MS Continues to Bag Money to Promote a Ponzi Scheme, Even Money From China
Today in the front page
analytics.usa.gov: The Only Supported Version of Windows (This Past Week) is Only Used by About 13.9% of People in the US, the Home Base of Windows
Even Vista 7 is still used more
Rust is Very Secure
If only Rust itself is secure
Who Will be Held Accountable for Breaking Ubuntu by Imposing Rust on Otherwise-Functional Programs, in Effect Replacing GNU With Proprietary Microsoft (GitHub)?
they're practical people who merely point out that a bunch of buffoons not only ruin Ubuntu but also every future distro based on Ubuntu
Generation Chaff - Phase VIII: In Summary
Like "Science" with a capital "S", what we see here commercial interests usurping everything
Generation Chaff - Phase VII: Curtailing Alternative Media
There was always an obligation - a collective duty of sorts - to uphold independent journalism
Generation Chaff - Phase VI: Centralisation of Information (X, Cheetok/Fentanylware)
Would you trust information when controlled by such people?
Generation Chaff - Phase V: Censorship of Dissent (Painted as Harassment or Terrorism)
Censorship is all around us now
Generation Chaff - Phase IV: Apps Only Few Companies Decide On
Tools are being collectively confiscated, under the premise or false prospect of "security"
Generation Chaff - Phase III: Slop and Plagiarism
A lot of the current so-called 'economy' is built upon false valuations
Generation Chaff - Phase II: "Cloud", Blockchains and Other Hype
For those of us who turned down those propositions there was a struggle; we needed to justify not having skinnerboxes or "social" accounts in some site run by a private company
Generation Chaff - Phase I: Social Control Media
IRC predates the Web
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Thursday, October 23, 2025
IRC logs for Thursday, October 23, 2025
More Clues Shed on Collapse of Microsoft XBox
XBox is basically circling down the drain as Microsoft implements 2-3 waves of layoffs each month
'Vibe Coding' Doesn't Work
In a lot of ways, so-called 'Vibe Coding' is already considered vapourware or a passing fad promoted in the media by managers who try to justify mass layoffs, especially ridding companies of "very expensive" software engineers
Links 24/10/2025: Microsoft's Killing of XBox Connected to Revenue/Profit Problems, "How Elon Musk Ruined Twitter"
Links for the day
Gemini Links 24/10/2025: 86,400 Seconds and "Society's Task"
Links for the day