Bonum Certa Men Certa

What the LLM Scrapers Are Doing to Tux Machines

posted by Roy Schestowitz on Feb 27, 2025

Mist over Niagara Falls

Crossposted from Tux Machines

Earlier this month Jonathan Corbet published "Fighting the AI scraperbot scourge" (in LWN, or Linux Weekly News). The article became freely accessible to everybody earlier today. Corbet said "LWN content-management system contains over 750,000 items (articles, comments, security alerts, etc) dating back to the adoption of the "new" site code in 2002. We still have, in our archives, everything we did in the over four years we operated prior to the change as well. In addition, the mailing-list archives contain many hundreds of thousands of emails. All told, if you are overcome by an irresistible urge to download everything on the site, you are going to have to generate a vast amount of traffic to obtain it all. If you somehow feel the need to do this download repeatedly, just in case something changed since yesterday, your traffic will be multiplied accordingly. Factor in some unknown number of others doing the same thing, and it can add up to an overwhelming amount of traffic."

We have almost 250,000 pages and perhaps 300,000 objects. Some years ago scrapers became a pain in the arse (PITA), so we started converting everything to static. The transition was completed entirely in September 2023.

So far today it looks like we'll have served about 1.5 million requests at midnight. That's more than 50,000 per hour or 1,000 per minute. The server can cope with that, but for ordinary users the site feels slower as the queue of requests grows and is almost never vacant.

Blacklisting offensive IP address/blocks might be the last resort. As an associate put it last night, "the bots are killing dynamically generated sites. They are written by maliciously incompetent bumbling idiots with no regard for their impact on sites in any way. That includes complete disregard for copyright and other legal aspects."

Other Recent Techrights' Posts

[Meme] 9AM Meeting at Brett Wilson LLP
Brett Wilson LLP in space
99.99% Uptime in First Half of 2025
Since January there was only one noticeable outage
When People Call a Best/Close Friend of Bill Gates a "Serial Rapist"
Good thing that the Linux Foundation keeps the "Linux" trademark ("Linux Mark") clean
 
Free Software Foundation, Inc. (FSF) Inches Towards 75% of Fund-Raising Target
Will the cutoff date be extended again?
Gemini Space (or Geminispace) Grows, But Usage of Certificate Authority Let's Encrypt Drops Further
Ideally, all Gemini capsules should use self-signed certificates
Links 18/07/2025: More Microsoft Layoffs in Activision, The New Stack (Sponsored by Microsoft) Complains About Openwashing
Links for the day
Gemini Links 18/07/2025: OCC25 Gnus for Reading Usenet and RSS Feeds, Small Web Updates
Links for the day
Listing as Staff People Who Left the Company More Than Six Years Earlier
There are apparently no laws against that
Brian Fagioli Shovels Up LLM Slop (Plagiarism) Onto Slashdot, Then Uses Slashdot for Affirmation or as Badge of Honour
Notice how some of his latest slop is presented ("as featured on Slashdot")
Social Control Media Productivity
Snapping photos of the bone
The Law Firm SLAPPing Us For the Microsofters Lost 72% of Its Tangible Assets in the Past Year, According to Its Own Reports
That might help explain why they're willing to tolerate serial stranglers from Microsoft as clients
Slopwatch: LinuxSecurity.com Slopfarm and Slopfarms Propped Up by Google News
"As LLM slop is foisted onto the WWW in place of knowledge and real content, it now gets ingested and processed by other LLMs, creating a sort of ouroboros of crap."
Links 18/07/2025: Weather Events and Health Hazards
Links for the day
Microsoft's All-Time Low in Finland
Microsoft is in a freefall
Security: Shane Wegner & Debian statement of incompetence
Reprinted with permission from Daniel Pocock
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Thursday, July 17, 2025
IRC logs for Thursday, July 17, 2025
Gemini Links 17/07/2025: "Goodreads for Gemini" and Defence of "The Small Web"
Links for the day
Links 17/07/2025: Anger and Morale Issues at Microsoft, Wars and Conflicts Get Digital
Links for the day
CALEA / CALEA2 is the Real Problem, Not Chinese Operatives Exploiting CALEA / CALEA2 (as Any Other Nation Can)
CALEA / CALEA2 is more of a front door than a back door
Nils Torvalds and Anna "Mikke" Torvalds (née Törnqvis) Hopefully Use GNU/Linux by Now
"Torvalds Family Uses Windows, Not Linus’ Linux"
Attack of the Slopfarms
FUD-amplifying bots with slop images, slop text (LLM slop)
Not My Problem, I Don't Care
Context/inspiration: Martin Niemöller
Honest Journalism About the European Patent Office Ceased to Exist After SLAPPs and Bribes to the Media
The EPO is basically a Mafia
Microsoft Bankruptcy in Russia, Shutdown in Pakistan, What Next?
It seems possible that in 2025 alone Microsoft will have laid off over 50,000 workers
Life Became Simpler When I Stopped Driving and I Don't Miss Driving When I See "Modern" Cars
Gee, wonder why car sales have plummeted...
Why I Believe Brett Wilson LLP and Its Microsoft Clients Are All Toast
So far our legal strategy has worked perfectly
EPO Jobs Are Very Toxic and Bad for One's Health
Health first, not monopolies
Response to Ryo Suwito Regarding the Four Freedoms
the point of life isn't to make more money
Microsoft's Morale Circling Down the Drain
Or gutter, toilet etc.
What Matters More Than "Market Share"
The goal is freedom, not "market share"
Tech Used to be Fun. To Many of Us It's Still Fun.
You can just watch it from afar and make fun of it all
Links 17/07/2025: "Blog Identity Crisis" and Openwashing by Nvidia
Links for the day
Greffiers and the US Attorney of the Serial Strangler From Microsoft
The lawsuit can help expose extensive corruption in the American court system as well
Credit Suisse collapse obfuscated Parreaux, Thiébaud & Partners scandal
Reprinted with permission from Daniel Pocock
The People Who Promoted systemd in Debian Also Promote Wayland
This is not politics
UK Media Under Threat: Cannot Report on Data Breach, Cannot Report on Microsoft Staff Strangling Women
The story of super injunction (in the British media this week, years late)
Victims of the Serial Strangler From Microsoft, Alex Balabhadra Graveley, Wanted to Sue Him But Lacked the Funds (He Attacked Their Finances)
Having spoken to victims of the Serial Strangler From Microsoft
Links 17/07/2025: Science, Hardware, and Censorship
Links for the day
Gemini Links 17/07/2025: Staying in the "Small Web" and Back on ICQ
Links for the day
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Wednesday, July 16, 2025
IRC logs for Wednesday, July 16, 2025
Under the Guise of "MIT Technology Review Insights" the Site MIT Technology Review Posts Corporate Spam as 'Articles'
Some of the articles aren't even articles but 'hit pieces' against Free software and some are paid advertisements
Brett Wilson LLP Has Track Record in Scam Coin Cases (e.g. Craig Wright and More), Now It Works for 'Crypto' Scam Purveyors
But wait, it gets worse
Exclusive: corruption in Tribunals, Greffiers, from protection rackets to cat whisperers
Reprinted with permission from Daniel Pocock
Will Brett Wilson LLP Handle Its Own Winding Up Petition or be Struck Off for Overt Abuse of Process?
Today we sue not only the first Microsofter
Links 16/07/2025: Chip Bans and Microsoft’s “Digital Escort” Program
Links for the day
Ubuntu Becomes Microsoft GitHub, Based on Decision Made by British Army Officer
You're hopeless, Canonical
Revolving Doors: One Day You're a Judge, the Next Day You're an Attorney Paying Public Officials and Working for Violent and Dangerous Microsoft Employees
how the US justice system works
Sharing Code and Recipes
It helps explain the triviality of software freedom
Slopwatch: Noise, Plagiarism and Even Fear, Uncertainty, Doubt/Fear-mongering/Dramatisation
What are we meant to do to prevent a false association or misleading connotations? Game the LLMs? No. Boycott slopfarms.
How Many Women Has Microsoft's Alex Balabhadra Graveley Already Strangled and Where Does That End?
If you too are a victim of this man and wish to share information, contact us
Gemini Links 16/07/2025: BaseLibre Numerical System and Simple Web Browsing with TLS
Links for the day
Links 16/07/2025: Fascist Slop Takes "Intelligence" Clothing, New Criminal Case Against MElon
Links for the day
"We Might Save Somebody's Life"
I follow the example of my father
Why I am Suing the Serial Strangler From Microsoft, Alex Balabhadra Graveley, in the UK High Court This Week
Out of respect to the process and to the Court, I shall not share any pertinent details about the case
Links 16/07/2025: China’s Economy Grows Steadily, France Takes Action Regarding Harm to Children by GAFAM and Fentanylware (TikTok)
Links for the day
It is Not About Politics
Beware the people who try to make this about politics
Good Journalism Saves Lives
a shocking number of women die or get seriously hurt every day due to violence from a partner
Recognition of Women's Contributions to Free Software
Being passive is not an option when bad things are happening
Slopfarms Are Going to Perish Because Public Opinion is Changing
Many slopfarms will simply go offline
19 Years of Standing Up for Justice, Equality, and Truth
This week we shall take it up a notch
Gemini Links 16/07/2025: Tmux and OCC25 Working TLS
Links for the day
Over at Tux Machines...
GNU/Linux news for the past day
IRC Proceedings: Tuesday, July 15, 2025
IRC logs for Tuesday, July 15, 2025