How We Defeated DDoS Attacks
One of the main reasons we went static 3 years ago, starting with Tux Machines, was the DDoS attacks it had long been subjected to by bots misusing the back end and overwhelming the database. We needed to write and run programs to mitigate, as manual intervention was not possible while sleeping or away from home. I remember having to leave the gym early and literally run home to 'fix' Tux Machines. Those were unpleasant times. Then there's the recovery effort, which sometimes meant working overnight to re-add pages.
I sacrificed my health to keep Tux Machines online. This went on for about 5 years.
The moment Tux Machines was purely (also old pages) on the Static Site Generator (SSG) these issues were resolved overnight. Tux Machines has since then been working OK about 99.99% of the time (reboots don't take long).
That site is very active and adding new pages doesn't take as long as before (with Drupal everything was slow and felt 'heavy').
Yesterday we saw this BSD site stating: "The amount of bot traffic has increased significantly, I assume to find content for AI, and ignoring robots.txt and copyright. I don't think people realize the scale of this. Its causing a denial of service attack in server resources and developer time."
Identifying rogue bots isn't easy. It's possible, but it takes a lot of effort. It's a moving target.
One of the best things one can do is migrate to an SSG. █
Update: Hours ago Mageia reported experiencing the same issues:
An avalanche of AI bots is repeatedly taking parts of our website down
We have always had bots visiting our website. They were mostly kind bots, like the crawlers that keep the databases of search engines up-to-date. Those kind bots start by looking at our robots.txt files before doing anything, and respect the restrictions that are set in those files.
However, things have changed. Like other websites, for instance Wikipedia, we are more and more being visited by AI scrapers, bots that scrape the Internet for anything they can find to train AI applications. They are usually extremely hungry for information, so they download much, much more than an ordinary user would do. Moreover, many of them are impolite: they don’t respect the rules set in our robots.txt files, they hide who they really are, they don’t put a little pause in between requests – on the contrary, they hammer our servers with requests from lots and lots of different IP addresses at the same time. The result is that parts of mageia.org, like our Bugzilla, Wiki and Forums, become unreachable.