Bonum Certa Men Certa

The LLM Ouroboros Phenomenon

posted by Roy Schestowitz on May 19, 2025,
updated May 19, 2025

An ouroboros in a 1478 drawing in an alchemical tract

Ancient Greek mythology came up with this concept of an ouroboros, wherein some animal - typically a snake for a feasible "IRL" (in real life) metaphor - eats itself by eating its own tail. We would not be the first to point out the analogy here for LLMs because an ouroboros is a good parable. This morning we catalogued two BSD and Linux sites complaining about desperate LLM scrapers staging a DDoS attack in pursuit of original, as in human-written, code or words. This isn't a new problem for us and in the past few days we served about half a million pages in Gemini Protocol, likely due to to LLM scrapers. It's obnoxious to say the least, but distinguishing benign from malicious (or worthless junk) requests is hard and a "moving target" (it's never enough as parasites learn to adapt).

This morning in IRC we made an assertion about LLMs and fake (slop) images. We also made several observations. Fact #1: over time slop gets worse (training set is like some blurry JPEG). Fact #2: People's "smell" for slop improves over time, as they 'train' on slop and can detect it based on prior encounters. Put 1 and 2 together.

Are LLMs bound to not only get worse but also more easily detectable by an increasingly sceptical general public? TheLayoff.com has just responded to this.

An associate opines that fact #1 (that slop gets worse over time) is exacerbated by the flood of slop on the Net being snarfed up by newer bots and mistaken for training data. "Thus the feedback loop I mentioned a long time back and which Andy wrote about in depth." (He was referring to Dr. Farnell's good writings about this dilemma - as he did several times in The CyberShow's blog)

To a certain extent my Ph.D. thesis (dissertation) covered this about two decades ago. The associate says that it's a "well-known problem from days of old".

There are several unique aspects to this, including validation bias. To me it seemed a bit related to but not the same as over-training because, as an associate explains, "overtraining is something else: too much data and the patterns become locked too tightly to the training set and less useful for new data".

For an LLM to scan online its own output serves to affirm the mistakes, or the errors, often euphemised as mere "hallucinations", which are innocent, not libellous, and by no means "intentional" and "harmful". Dr. Farnell and Dr. Kate Brown responded to this last October in "Radical disbelief and its causes".

In the context of my thesis (dissertation), a concern was raised about what we back then called "synthetic data" finding its way "back" into the training set. So when you check brain MRI scans (which is what we did back then) you must ensure you only ever deal with real data, not mock or manipulated data that can confirm your own biases and "fit into" the model that generated it in the first place (in generative mode). To use the analogy of text-based LLMs, your BS is "truth" if your input is your own BS (output/s) and it would be deemed accurate, based on you (opposite of the notion of peer review in science). The associate correctly points out, based on a scan of my thesis (dissertation), that the strings "overtraining" and "over-training" are not in the dissertation, but we used different terms back then.

A squat toilet (also known as an Eastern, Turkish, Iranian or Natural-Position toilet). This one is in Turkey

"An LLM Ouroboros of shit", as the associate dubs it, would be statistical models (such as PDMs or AAMs*) treating computer-generated images as something from "the real world".

The so-called "generative hey hi" (genAI) "bros" won't allow the media to talk about such issues, at least if they can downplay the issues and deny/misportray them (in the media). But it's a real and growing problem. Its magnitude likely grows quadratically, not linearly. Just like other bubbles (overabundance based around hype), don't expect linear implosions. When it's gone (poof!), it's gone.

____

* PDM and AAM need expansion in the explanatory sense, not just words (in the acronyms). PDMs go back several decades ago they were invented or pioneered by the people who tutored me. They use mathematical, statistical models to perform multidimensional analysis of data variations, based upon principal component analysis (PCA). AAMs are an extension but with textures, not only points. This is really old stuff; even AAMs are over 23 years old; now the mainstream media pretends those are some kind of "revolution".

Other Recent Techrights' Posts

The Register MS Has Begun Using Slop Images
It's not clear when it started; but it's definitely getting worse [...] Worst of all are 'articles' about slop that are themselves slop
When It Comes to Technology, Mozilla and Firefox Are Illiberal
Last month in Planet Debian we saw one more person explaining to everyone how to "turn off" DRM in Firefox and hide the pop-up/s
 
Growing Our Reach
Our goal was never "hits"
The Russian Vision of Technology
Russia's surveillance is very extensive
Sooner or Later Almost Everyone Will Know "AI" is Just a Go-To, Misused, Misapplied, and Grossly Overused Term of Liars and Con Jobs Who Ride a Ponzi Scheme
At the expense of people gullible enough to "invest" in this or take salaries/bonuses in the form of "stock" (tied to a Ponzi scheme)
Reddit Funded by Microsoft
Reddit is merely a filter and we knows who controls that filter (using money)
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Saturday, August 16, 2025
IRC logs for Saturday, August 16, 2025
The Open Source Initiative Has Many Scandals, We'll Try to Summarise Them All
Open Source Initiative (OSI) hates facts
Open Source Initiative (OSI), Wikipedia, Molly De Blanc, and Censorship/Reputation Laundering
OSI is like SPLC. The old name remains, the mission changed
Gemini Links 17/08/2025: Misunderstanding "Geminiverse" and Let's Encrypt
Links for the day
Links 17/08/2025: Breaches, Layoffs, and Scams
Links for the day
Don't Talk to Bullies
This serious matter is still being examined by British authorities
The Case for Software Freedom in Europe Becomes Stronger as GAFAM and the US Become Allies of Those Who Invade Europe
"One would think that both sides of the pond would be very interested in this valuable commons and work to not just protect it but cultivate it further, rather than work to saw the legs from under it by advancing software patents instead."
Slopwatch: Google News, LinuxSecurity, LinuxBSDos.com, and Garbage From Brian Fagioli
nowadays when people search the Web or when one researches some topic (looking not just for news in Google News) one is increasingly likely to land on a fake 'article' spewed out by some Microsoft LLM
Gemini Links 16/08/2025: Back After Hiatus and News Aggregators in Geminispace
Links for the day
Links 16/08/2025: mRNA Being Abandoned, Putin Plant Flags in Alaska, Faces No Sanctions
Links for the day
Links 16/08/2025: Science Besieged, Confidentiality Standards Breached
Links for the day
Links 16/08/2025: Loners and Vacation, Climate Issues
Links for the day
Links 16/08/2025: Chatbots Bad for Kids, Software Patents Apple Battle
Links for the day
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Friday, August 15, 2025
IRC logs for Friday, August 15, 2025
Slopwatch: WebProNews and Google News Promoting Fake Articles About "Linux"
Google News is being flooded by these slopfarms, so when Linux news is being sought online (via Google News) many people will read bots that spew out FUD
Original European Patent Convention (EPC, 1973), Routinely Violated by the European Patent Office, Now in Geminispace
hundreds of thousands of European Patents must be immediately revoked
Gemini Links 16/08/2025: Politics and Alhena 5.2.8
Links for the day
Links 16/08/2025: "Hey Hi (AI) Data Centers Are Driving Up Electricity Bills for Everyone" and the Case Against Booking.com
Links for the day
Gemini Links 15/08/2025: Leasehold, Slop Bubble, and Xobaqu
Links for the day
Links 15/08/2025: Flight Attendant Strike, Floods, and Tropical Storms
Links for the day
Links 15/08/2025: German Government Falls Short on Free Software, Russians Breach EU Systems
Links for the day
Microsoft is Still Losing Cyprus
The market share goes down, so share prices go up
Microsoft Accenture is in Trouble
For one thing, its debt doubled in a matter of months
News Will Slow Down and Slop Will Contribute to the Slowdown
In recent years every time there was some holiday or major break the number people who "came back" shrank
Upgrading IRC Network of Techrights
a new version of the daemon we've used since 2021 was released very recently
X.Org is Still Not Dead
Oracle still developing it
"Register Debate Series" About Microsoft in the UK is Controlled by Microsoft (US)
The Register is run by Microsoft "Analysts", so the debate is doomed from the get-go
IBM is a Terrible Model for Red Hat
"Most likely caused by laying off too many people"
Microsoft Problems in Palestinian Territory and Israel
Microsoft stock (share price) goes up when market share goes down
Microsoft is getting ready to cause many employees to resign
Having already laid off many workers earlier this month, it now tries another approach
Slave is Not a Bad Word, We Need to Use It Sometimes
Who does such exclusion of words benefit? What sort of expression will be deemed impermissible and subjected to CoC enforcement?
National Day of Action
"This Friday, August 15th, there is an organized, petition-based, protest of Wells Fargo in major cities across the US," Richard Stallman wrote
Our Gemini Editions Now Contain 100,000+ GemText Pages
Our Gemini Editions aren't small, even if Gemini Protocol is still the 'underdog'
"Maybe the Problem is You"
they probably felt like they had no choice because they really needed this Microsoft money
The Relations Between the United States and Europe Deteriorate, Should Europe Continue to Rely on American Tech Giants?
The shallow notion that made-in-USA software is fairly safe for Europe to rely to is coming to a standstill
Techrights and Tux Machines Running as Usual During Vacations
No interruptions, maybe temporarily slowdowns
GNU OS, Powered by Hurd
Choice is good, as long as choices exist that respect the users' freedom
Gemini Links 15/08/2025: ADHD and "Random Weird Things"
Links for the day
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Thursday, August 14, 2025
IRC logs for Thursday, August 14, 2025
"Article 52. PATENTABLE INVENTIONS" in the European Patent Convention
Some time tomorrow we'll have a complete local copy of the EPC