Bonum Certa Men Certa

The LLM Ouroboros Phenomenon

posted by Roy Schestowitz on May 19, 2025,
updated May 19, 2025

An ouroboros in a 1478 drawing in an alchemical tract

Ancient Greek mythology came up with this concept of an ouroboros, wherein some animal - typically a snake for a feasible "IRL" (in real life) metaphor - eats itself by eating its own tail. We would not be the first to point out the analogy here for LLMs because an ouroboros is a good parable. This morning we catalogued two BSD and Linux sites complaining about desperate LLM scrapers staging a DDoS attack in pursuit of original, as in human-written, code or words. This isn't a new problem for us and in the past few days we served about half a million pages in Gemini Protocol, likely due to to LLM scrapers. It's obnoxious to say the least, but distinguishing benign from malicious (or worthless junk) requests is hard and a "moving target" (it's never enough as parasites learn to adapt).

This morning in IRC we made an assertion about LLMs and fake (slop) images. We also made several observations. Fact #1: over time slop gets worse (training set is like some blurry JPEG). Fact #2: People's "smell" for slop improves over time, as they 'train' on slop and can detect it based on prior encounters. Put 1 and 2 together.

Are LLMs bound to not only get worse but also more easily detectable by an increasingly sceptical general public? TheLayoff.com has just responded to this.

An associate opines that fact #1 (that slop gets worse over time) is exacerbated by the flood of slop on the Net being snarfed up by newer bots and mistaken for training data. "Thus the feedback loop I mentioned a long time back and which Andy wrote about in depth." (He was referring to Dr. Farnell's good writings about this dilemma - as he did several times in The CyberShow's blog)

To a certain extent my Ph.D. thesis (dissertation) covered this about two decades ago. The associate says that it's a "well-known problem from days of old".

There are several unique aspects to this, including validation bias. To me it seemed a bit related to but not the same as over-training because, as an associate explains, "overtraining is something else: too much data and the patterns become locked too tightly to the training set and less useful for new data".

For an LLM to scan online its own output serves to affirm the mistakes, or the errors, often euphemised as mere "hallucinations", which are innocent, not libellous, and by no means "intentional" and "harmful". Dr. Farnell and Dr. Kate Brown responded to this last October in "Radical disbelief and its causes".

In the context of my thesis (dissertation), a concern was raised about what we back then called "synthetic data" finding its way "back" into the training set. So when you check brain MRI scans (which is what we did back then) you must ensure you only ever deal with real data, not mock or manipulated data that can confirm your own biases and "fit into" the model that generated it in the first place (in generative mode). To use the analogy of text-based LLMs, your BS is "truth" if your input is your own BS (output/s) and it would be deemed accurate, based on you (opposite of the notion of peer review in science). The associate correctly points out, based on a scan of my thesis (dissertation), that the strings "overtraining" and "over-training" are not in the dissertation, but we used different terms back then.

A squat toilet (also known as an Eastern, Turkish, Iranian or Natural-Position toilet). This one is in Turkey

"An LLM Ouroboros of shit", as the associate dubs it, would be statistical models (such as PDMs or AAMs*) treating computer-generated images as something from "the real world".

The so-called "generative hey hi" (genAI) "bros" won't allow the media to talk about such issues, at least if they can downplay the issues and deny/misportray them (in the media). But it's a real and growing problem. Its magnitude likely grows quadratically, not linearly. Just like other bubbles (overabundance based around hype), don't expect linear implosions. When it's gone (poof!), it's gone.

____

* PDM and AAM need expansion in the explanatory sense, not just words (in the acronyms). PDMs go back several decades ago they were invented or pioneered by the people who tutored me. They use mathematical, statistical models to perform multidimensional analysis of data variations, based upon principal component analysis (PCA). AAMs are an extension but with textures, not only points. This is really old stuff; even AAMs are over 23 years old; now the mainstream media pretends those are some kind of "revolution".

Other Recent Techrights' Posts

Throwing Money at Lawyers Can't Stop Us (It Never Did)
Even just trying to censor things can result in the opposite of the desired outcome
BetaNews Has More or Less Died After Experiments With LLM Slop, Is Linuxsecurity Next?
It doesn't seem like BetaNews knows what it's doing, let alone what it talks about
Links 13/06/2025: Journalists Targeted by Cracking, China-Japan and Israel-Iran Tensions Grow
Links for the day
 
Links 14/06/2025: FDA Changes Priorities, Cassette Data Storage From The 1970s
Links for the day
Gemini Links 14/06/2025: Steam Next Fest and Thoughts on Gemini
Links for the day
Site/Datacentre Maintenance Next Week
speed things up
Bulgaria: GNU/Linux Near 10%
The Bulgarian market seems to be changing
I Never Spoke to BetaNews. But BetaNews Wants to Ensure I Never Will, Either.
Sometimes just the reluctance to talk about it can say a great deal
Online Search or Large Search Engines Aren't Working Anymore
business models that directly compete with interests of Web users
Holidays and Breaks
I've hardly taken any long breaks since I got married
Danish OpenDocument Freedom
"year of Linux"
When Abusive Law Firms (Working for Microsofters Against Us) Assert That Someone Writing in Social Media About Himself is Confidential Information
There was no reason to throw "GDPR" into 2 SLAPPs; they know it, but the goal was to increase the cost of a Defence and lessen the incentive to challenge the SLAPPs
Links 14/06/2025: Wars and L.A. Distortion Effect
Links for the day
Gemini Links 14/06/2025: Historic Ada Design and GeminiSpace.Club to Expire
Links for the day
Links 14/06/2025: India Plane Crash and Middle-Eastern War
Links for the day
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Friday, June 13, 2025
IRC logs for Friday, June 13, 2025
Gemini Links 13/06/2025: (Not)virtues and Project Yeet Broadband
Links for the day
Links 13/06/2025: US Reduces Nonessential Staff at Baghdad Embassy Ahead of Strikes in Iran, Invasion of California Debated
Links for the day
X11 is Free Software
Whether you agree (e.g. on politics) with the person/s forking it doesn't matter
The More Time Passes, the Better Our Advice on Social Control Media Seems
At the end of the day, any platform you do not control yourself is working for someone else
Twitter (X) is Dying, Now It's Just Like a Mafia-Type Operation of the Man Who Does Nazi Salutes in Public
a form of extortion
UK High Court Blasts Brett Wilson LLP for Misusing "GDPR" After Failed Efforts to Censor Critics Using 'Libel' Claims
No wonder this firm is rapidly shrinking
Recent Blunders in Microsoft GitHub (e.g. Slop-Generated Bug Reports or GPL Violations 'as a Service') Taking Their Toll?
Put bluntly, if you still use Microsoft GitHub, then you're slave to Microsoft
American Imperialism and Microsoft Plagiarism
Techrights will therefore do what Microsoft does not want it to do: it'll write even more about Microsoft
When They Have Nothing Left to Help Advance Abusive Litigation for Microsoft People... Other Than Throwing ~500 Pages of Someone Else's Work Into a PDF
Microsoft is having a very tough year
The Price of Exposing Corruption in Poland (and Elsewhere)
It's easier to participate in corruption than to merely do the right thing and oppose it
Slopwatch and Yet More Holes in 'Secure Boot' (as Usual!), Promoted Inside Linux by the Man We Are Suing
Today's Slopwatch will be short
Gemini Links 13/06/2025: People You've Left Behind, Life Update and OS Changes
Links for the day
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Thursday, June 12, 2025
IRC logs for Thursday, June 12, 2025
Links 12/06/2025: Portland Homeless Deaths Quadruple, COVID Cases Surge in Asia
Links for the day
Abuse Inside the Polish Patent Office (UPRP) - Part IX: Minimum Wages For You (Experienced Scientist), Alicante/EU Paydays For Me (Unproductive, Corrupt Official)
Does UPRP maladministration extend to the false belief that qualified and experienced scientists can play the role of circus clowns?
"The Liberating Power of Simply Telling People the Truth."
'polite' bullying
Who Imitates Who? Plagiarist as Client (From Microsoft), 'Plagiarism' at the Law Firm?
let's revisit the subject
EPO's Gareth Lord Asked About "Quality and Productivity" or, Put Another Way, Why the EPO Keeps Granting So Many Invalid/Illegal Patents
letter to Lord
EPO's Central Staff Committee (CSC) Scrutinises the Man Who Illegally Grants (and Forces Others to Illegally Participate in Granting) Software Patents in Europe
EPO compels examiners to break the law in the name of obeying illegal "rules" or "orders"
The Latest Rumour Says The Next (as Correctly Predicted Before) Wave of Layoffs at Microsoft is 3 Weeks Away, "Larger Than the First Wave"
Step 2
TV Licensing Used to SPAM Your Postbox, Now It Does the Same to E-mail
First they ask for your E-mail address; then they start nagging you via E-mail
The Toxic Playbook
Either you support Prince Mohammed bin Salman or you're a nazi
It's Possible That BetaNews Got Cracked, But Nobody Talks About It, The Site Contains an Outdated Old Image, No Activity
It's possible that they will never explain what happened to the site and users' accounts
Links 12/06/2025: Beach Boys’ Brian Wilson Dies
Links for the day
Gemini Links 12/06/2025: Video Game Diegesis and Steam Next Fest
Links for the day
Why the Militants Have Lost Every Battle Since 2022 (When Attacking My Wife and I in Various Ways, Even Attacking Our Employers)
This takes patience, sure, but at the end most evildoers face the consequences for their actions
Our Priority is Still Tackling Software Patents and Corruption in Patent Offices
Meanwhile we got compliments on our recent articles, which means that they are effective
Politics Will Impact Software Choices
Will those systems respect users' freedom?
EPO: Neglecting Children to Promote American Monopolies by Shielding Them From European Competition
Yesterday the Central Staff Committee at the EPO spoke about another "reform" at the Office
Slopwatch: Another Day, Another Slopfest, LLM Slop Scrapers Slow Down Our Site
We too have some slop issues; this past day this site and the sister site had to answer about 2.5 million requests (not counting Gemini Protocol) and it's slowing things down for everybody
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Wednesday, June 11, 2025
IRC logs for Wednesday, June 11, 2025