Why soylentnews.org Has Been Having Technical Difficulties Lately
Old: Soylent News Editor Stays, Trolls Leave Instead
Lately, and quite unfortunately we regret to admit, soylentnews.org has had many downtimes and sometimes error messages (on Web pages), both in IRC and the World Wide Web. Looking at recent IRC transcripts (from periods of time the network was up), we see messages like "uh oh 503 again", "503 SN home page", and "Another 503 just trying to look at a comment."
"I was born in Manchester," Jan Rinok said, "we get worried if it doesn't rain..."
It's great to see that Rinok is still deeply involve in spite of the vicious trolling against him.
The network has been going up and down quite a lot this past week.
Here are some relevant discussions from the #soylent
channel, which everybody can access and see:
<janrinok> We know it is a timeout, but where? Yesterday, kolie discovered that the db was running out of connections. Something is keeping the connections open far too long. However, this was resolved by increasing the size of the connection pool available to the db, which solved that problem,<janrinok> What I have noticed this morning is that the response to a request is often being built up on my screen in parts. Normally this would happen so quickly as to not be noticeable, but even some elements are being delayed during the request processing.
<fab23> could be caused of some crawler requesting db intense pages too quickly
<janrinok> That might give kolie an idea where to start looking. One problem is that there is no test harness for Rehash. Over the years several options in the code have been removed for very good reasons at the time. The problem is that they have never been tested as a complete entity other than letting the users get at them.
<janrinok> Or, perhaps more correctly, the various elements have never been fully tested for corner and edge cases before being included in the final build. It is possible that some of these are now conflicting which each other, causing the problems to appear now - although the code changes were made a long time ago.
<janrinok> fab23 - agreed. But overnight it looks like kolie has re-enabled that part of the firewall that takes care of that problem, but the 5xx are still appearing.
<janrinok> The requests by AI bots was still within the capabilities of the server, but we are now blocking some of those sources anyway.
<janrinok> Another problem with blocking the bots is that some of them use the same vpns that our users do. Blocking the source could well prevent our users from accessing the site. If they have paid for a vpn service they might be unimpressed to find that we are intentionally blocking it so we have to tread gently.
<janrinok> Some of our users have already complained of this (although not very many of them). They think that the block is directed at them personally and not the bots which are abusing our site. We have to be very careful not to wield too big a hammer to crack a relatively small nut.
<fab23> I see
There's a mention there of Varnish cache server and reports of very high load: "I think we need to allocate some more cores to the infra. The load numbers are just too high. I got them down to 28 from 250. top - 19:18:35 up 2 days, 18:39, 4 users, load average: 18.55, 64.41, 100.99
" (by kolie)
We hope they'll sort it out soon. They deal with a relatively complex system. █