Bonum Certa Men Certa

What the LLM Scrapers Are Doing to Tux Machines

posted by Roy Schestowitz on Feb 27, 2025

Mist over Niagara Falls

Crossposted from Tux Machines

Earlier this month Jonathan Corbet published "Fighting the AI scraperbot scourge" (in LWN, or Linux Weekly News). The article became freely accessible to everybody earlier today. Corbet said "LWN content-management system contains over 750,000 items (articles, comments, security alerts, etc) dating back to the adoption of the "new" site code in 2002. We still have, in our archives, everything we did in the over four years we operated prior to the change as well. In addition, the mailing-list archives contain many hundreds of thousands of emails. All told, if you are overcome by an irresistible urge to download everything on the site, you are going to have to generate a vast amount of traffic to obtain it all. If you somehow feel the need to do this download repeatedly, just in case something changed since yesterday, your traffic will be multiplied accordingly. Factor in some unknown number of others doing the same thing, and it can add up to an overwhelming amount of traffic."

We have almost 250,000 pages and perhaps 300,000 objects. Some years ago scrapers became a pain in the arse (PITA), so we started converting everything to static. The transition was completed entirely in September 2023.

So far today it looks like we'll have served about 1.5 million requests at midnight. That's more than 50,000 per hour or 1,000 per minute. The server can cope with that, but for ordinary users the site feels slower as the queue of requests grows and is almost never vacant.

Blacklisting offensive IP address/blocks might be the last resort. As an associate put it last night, "the bots are killing dynamically generated sites. They are written by maliciously incompetent bumbling idiots with no regard for their impact on sites in any way. That includes complete disregard for copyright and other legal aspects."

Other Recent Techrights' Posts

Links 10/05/2025: Germany Considers Smartphone Ban in Schools, Right to Repair Bills
Links for the day
Blizzard/Microsoft Unions Grow Ahead of Mass Layoffs at Microsoft, Apparently Starting Next Week (as Many as 30,000 Workers Laid Off by Year's End)
Microsoft already fired about 5,000-6,000 workers this year by our estimates; that's not counting resignations compelled through pressure (i.e. pushed, did not jump) and contractors
"Victory Day" - Part II: Abject Defeat to Hypocrites and Objectionable People Who Strangle Women Whilst on Microsoft's Payroll
Someone is going to have to pay for this; it won't be us
Rust Propaganda Now Amplified by Slopfarms Powered by Microsoft LLMs, Encouraging the Outsourcing of GNU/Linux Distros to Microsoft/GitHub/NSA (and a Shift Away From GPL/Copyleft)
Moving to Microsoft GitHub and adopting unfinished, untested code for highly critical bits
 
Control Your Systems, Control All Your Data
what does it take for us to control our own systems and data?
Misplacing Blame for Security Problems, Sometimes With LLM Slop That Blames "Linux" for Microsoft's Failures
Broken telephones and stochastic parrots beget plenty of Fear, Uncertainty, Doubt (FUD)
Links 10/05/2025: WW2 Revisionism, Further Tit-for-tat in India-Pakistan Conflict
Links for the day
Gemini Links 10/05/2025: Git Server and Great LLM DDoS of 2025
Links for the day
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Friday, May 09, 2025
IRC logs for Friday, May 09, 2025
Links 09/05/2025: Inflation Rising and Rights to Protest Curtailed Some More
Links for the day
Gemini Links 09/05/2025: Good and Evil, LLMs Made the Web Worse Yet Again
Links for the day
IBM is Rotting With "Zero Internal Jobs" and Many PIPs (Performance Improvement Plans) on the Way, Typically a Fast Track Towards Layoffs Without Severance
At risk of giving air(time) to tribal sentiments, the internal joke at IBM is that to IBM "AI" stands for "All Indian"
European Patent Office (EPO) Faked "Revenue Expansion" by Granting Loads of Invalid, Illegal Patents; Staff Still Wants to Know Where That Money Went
Only about 30% of the EPO's patents are for EU entities/people
The Gerstnerisation of Microsoft: Seventh Wave of Microsoft Layoffs (Over 20,000 to be Cut) Allegedly Going to Start Shortly, Probably Start of Next Week, Microsoft Spreads Chaff and Noise Before the Big Axes Fall
we might be looking at about 50,000 people that Microsoft gets rid of this year
Links 09/05/2025: TeleMessage Blunder, More Distractions From Impending Mass Layoffs at Microsoft
Links for the day
GNU (and the FSF) Still Changing the World
Today, in 2025, GNU powers almost everything
Military-Grade Anti-Linux Microsoft Propaganda Using Microsoft LLMs in Fake 'News' Sites (Slopfarms)
This is part of a pattern
Links 09/05/2025: Analog Computer and First time at FOSDEM
Links for the day
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Thursday, May 08, 2025
IRC logs for Thursday, May 08, 2025
Links 08/05/2025: Mass Layoffs at Google Again, India/Pakistan Tensions Continue to Grow, New Pope (US) Selected
Links for the day
"Victory Day" - Part I: That is the Day Microsofters Who Assault Women Pay for Their Actions in Foreign Land (Using "Guns for Hire" Who Attack Their Own Country for American Dollars)
Adding a friend from Microsoft to the docket didn't help
Rust is Starting to Seem More Like Microsoft-hosted "Digital Maoism", Not a Legitimate Effort to Improve Security
Maybe this is very innocent, but they seem to have taken a solid, stable program from a high-profile Frenchman and looked for ways to marry it with GitHub, i.e. Microsoft/NSA
Gemini Links 08/05/2025: Practical Gemini Use Case, Shutdown of the Blanket Fort Webring
Links for the day
Links 08/05/2025: "Slop Presidency", US Government Defunds Public Broadcasting
Links for the day
Lasse Fister, Organiser of Libre Graphics Meeting, Points Out the Code of Conduct is Likely Violated by the Same People Who Promote Codes of Conduct (and Then Bully Him Into Cancelling a Keynote)
I am starting to see Lasse Fister as another victim
LLM Slop Attacks Not Only Sites of Free Software Projects But Also Bug Reporting Systems (Time-wasting, in Effect "DDoS")
Microsoft, the leading purveyor and promoter of slop, is a cancer
The Richard Stallman (RMS) "European Tour" Carries on In Spite of the Nuremberg Incident
Some people spoke about how they saw yesterday's talk
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Wednesday, May 07, 2025
IRC logs for Wednesday, May 07, 2025