Bonum Certa Men Certa

The LLM Ouroboros Phenomenon

posted by Roy Schestowitz on May 19, 2025,
updated May 19, 2025

An ouroboros in a 1478 drawing in an alchemical tract

Ancient Greek mythology came up with this concept of an ouroboros, wherein some animal - typically a snake for a feasible "IRL" (in real life) metaphor - eats itself by eating its own tail. We would not be the first to point out the analogy here for LLMs because an ouroboros is a good parable. This morning we catalogued two BSD and Linux sites complaining about desperate LLM scrapers staging a DDoS attack in pursuit of original, as in human-written, code or words. This isn't a new problem for us and in the past few days we served about half a million pages in Gemini Protocol, likely due to to LLM scrapers. It's obnoxious to say the least, but distinguishing benign from malicious (or worthless junk) requests is hard and a "moving target" (it's never enough as parasites learn to adapt).

This morning in IRC we made an assertion about LLMs and fake (slop) images. We also made several observations. Fact #1: over time slop gets worse (training set is like some blurry JPEG). Fact #2: People's "smell" for slop improves over time, as they 'train' on slop and can detect it based on prior encounters. Put 1 and 2 together.

Are LLMs bound to not only get worse but also more easily detectable by an increasingly sceptical general public? TheLayoff.com has just responded to this.

An associate opines that fact #1 (that slop gets worse over time) is exacerbated by the flood of slop on the Net being snarfed up by newer bots and mistaken for training data. "Thus the feedback loop I mentioned a long time back and which Andy wrote about in depth." (He was referring to Dr. Farnell's good writings about this dilemma - as he did several times in The CyberShow's blog)

To a certain extent my Ph.D. thesis (dissertation) covered this about two decades ago. The associate says that it's a "well-known problem from days of old".

There are several unique aspects to this, including validation bias. To me it seemed a bit related to but not the same as over-training because, as an associate explains, "overtraining is something else: too much data and the patterns become locked too tightly to the training set and less useful for new data".

For an LLM to scan online its own output serves to affirm the mistakes, or the errors, often euphemised as mere "hallucinations", which are innocent, not libellous, and by no means "intentional" and "harmful". Dr. Farnell and Dr. Kate Brown responded to this last October in "Radical disbelief and its causes".

In the context of my thesis (dissertation), a concern was raised about what we back then called "synthetic data" finding its way "back" into the training set. So when you check brain MRI scans (which is what we did back then) you must ensure you only ever deal with real data, not mock or manipulated data that can confirm your own biases and "fit into" the model that generated it in the first place (in generative mode). To use the analogy of text-based LLMs, your BS is "truth" if your input is your own BS (output/s) and it would be deemed accurate, based on you (opposite of the notion of peer review in science). The associate correctly points out, based on a scan of my thesis (dissertation), that the strings "overtraining" and "over-training" are not in the dissertation, but we used different terms back then.

A squat toilet (also known as an Eastern, Turkish, Iranian or Natural-Position toilet). This one is in Turkey

"An LLM Ouroboros of shit", as the associate dubs it, would be statistical models (such as PDMs or AAMs*) treating computer-generated images as something from "the real world".

The so-called "generative hey hi" (genAI) "bros" won't allow the media to talk about such issues, at least if they can downplay the issues and deny/misportray them (in the media). But it's a real and growing problem. Its magnitude likely grows quadratically, not linearly. Just like other bubbles (overabundance based around hype), don't expect linear implosions. When it's gone (poof!), it's gone.

____

* PDM and AAM need expansion in the explanatory sense, not just words (in the acronyms). PDMs go back several decades ago they were invented or pioneered by the people who tutored me. They use mathematical, statistical models to perform multidimensional analysis of data variations, based upon principal component analysis (PCA). AAMs are an extension but with textures, not only points. This is really old stuff; even AAMs are over 23 years old; now the mainstream media pretends those are some kind of "revolution".

Other Recent Techrights' Posts

Skype Fell Off a Cliff (Microsoft Killed It), All Microsoft Has Left Now is Slop and Spaghetti Code
"This isn’t about AI. This is a puppet show to drive stock prices up and down."
Slopfarms (Machine-Generated Fake News Sites Authored by Bots With Slop Images) Spread GNU FUD
This isn't about Linux (GNU doesn't run just on Linux)
United States Federal Government's Digital Analytics Program (DAP): GNU/Linux Users Represent Close to 6% of Visitors This Year
How far has GNU/Linux gotten? Very far!
The "LLM Ouroboros of Shit" is Complemented by Even Worse Phenomena Caused by Microsoft's Contribution of SPAM and Pollution
Microsoft became a world leader in promotion of LLM slop
The LLM Ouroboros Phenomenon
Fact #1: over time slop gets worse (training set is like some blurry JPEG). Fact #2: People's "smell" for slop improves over time, as they 'train' on slop and can detect it based on prior encounters. Put 1 and 2 together.
How We Defeated DDoS Attacks
One of the best things one can do is migrate to an SSG
Microsofters Issuing Threats to Microsoft Critics Who Blog About Microsoft
So far we see that their "legal strategy" revolves around trying to discredit people like Theodore Ts'o
 
Microsoft a Top Sponsor at Red Hat Summit (IBM Selling Proprietary Spyware and Back Doors in a "Red" Trench Coat)
They both work for Microsoft
The Official SUSE Blog Uses LLM Slop to Compose Fake Articles Promoting Microsoft and Azure
even a little slop spoils the broth
Links 19/05/2025: Charges of Blackmailing Over Son Heung-min, Chad Opposition Leader Detained
Links for the day
Gemini Links 19/05/2025: Ableism, Silicon Monkeys, and More
Links for the day
Links 19/05/2025: Political Catchup and CISA Advisories
Links for the day
TheLayoff.com Has Begun Deleting Trolls/AstroTurfers Infesting the IBM Section to Discourage On-Topic Discussion About Culls and Maladministration (Bad Strategy)
Moderators have realised there's a problem
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Sunday, May 18, 2025
IRC logs for Sunday, May 18, 2025
Gemini Links 18/05/2025: Five Years on Gemini and Atom Feeds over Gopher
Links for the day
Links 18/05/2025: F.D.A. More Sceptical of COVID-19 Vaccines, UK Charges 3 Iranian Nationals In Alleged Attack Plot Against Journalists
Links for the day
Gemini Links 18/05/2025: "Finally Upgraded" and "Rebooting"
Links for the day
There Are Days or Occasions Where gemini:// Requests Almost Exceed http(s):// and Gemini Protocol Isn't Even 6 Yet
Gemini Protocol turns 6 one month from now
Abundance of Good Code, "Just Like Air."
Richard Stallman's seminal manifesto and foundational (practical) work on GNU gave us a very solid system that facilitates productive work without concerns over spyware
Messages in TheLayoff.com Drowned Out by LLM Slop (Comments Focused on Replying to Bot-Generated Provocation)
apparently shaking hands with nazis isn't as bad as calling your git repository's main branch "master"
The Importance of Full Disclosure and Transparency Online
there will be full transparency, as always
Slopwatch: Slopfarms and Serial Sloppers Still at It
Apparently Google is too understaffed to figure that out
Links 18/05/2025: Decreased Prospects of Science Careers, Disappearance of Journalists
Links for the day
Microsofters Have a Long History Trying to Take Down Techrights by Sending Threats to Webhosts
picking on women
Links 18/05/2025: Science, Censorship and European Commission Taking on Monopoly Abuse by Microsoft
Links for the day
Gemini Links 18/05/2025: Šibenik and SFJAZZ Historical Archive
Links for the day
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Saturday, May 17, 2025
IRC logs for Saturday, May 17, 2025
Links 17/05/2025: Microsoft Kills "Surface Laptop Studio" (More Canceled Products/Units), Groups Caution About Harms of Social Control Media
Links for the day
Gemini Links 17/05/2025: Sympathy Algorithm and SSH on Alternative Ports
Links for the day
Inviting the Founder of GNU/Linux to Events (It Only Costs His Travel Expenses) and Recalling the True Origins
It's reassuring to see belated recognition
Slopwatch: Microsoft's Anti-Linux Propaganda and Cover-up, Slopfarms Clogging Up Google News
slop-tracking activities that observe googlebombing of "Linux"
AstroTurfing by IBM in thelayoff.com is Highly Risky (and Likely Outsourced)
Microsoft did this in Reddit (and got caught), so why won't IBM too?
Links 17/05/2025: Stabber of Salman Rushdie Sentenced to 25 Years in Prison
Links for the day
The Microsofters Have Just Shared Privileged Trial Data With Microsoft
There are serious ramifications for liability accountability as Microsoft salaries sponsor these SLAPPs
Trolls With LLM Slop Are Disrupting Communications About Mass Layoffs at IBM
LLM slop to drown out the signal
Gemini Links 17/05/2025: Happier on Gemini and Manipulating Reddit
Links for the day
ComEd and Microsoft: A Mess of Spaghetti Held Together By Circus Clowns
Reprinted with permission from Ryan Farmer
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Friday, May 16, 2025
IRC logs for Friday, May 16, 2025