The FSF's SysOps Team Recovered From Serious Hardware Issue Within Hours
An hour ago we mentioned the latest update from PCLinuxOS, which suffered an actual fire where everything was hosted.
About half a day ago I noticed that all/most GNU/FSF sites were not reachable and thus reached out to a contact for any details. The FSF started issuing updates about 12 hours ago:
- There is an outage of most FSF run services. We are investigating.
- Most services are still down. Staff is researching what appear they may be hardware issues related to clustered SAN storage. We will update further once staff-members are onsite at the data-center.
- Staff is on-site and has begun replacing failed disc within the SAN.
- There is not an ETA for service restoration at this time. It may be as much as an hour or two before we have additional updates, but likely a bit sooner.
- Work on this issue continues. One disc array is booting now however disc errors remain to investigate.
- Sites, etc, where they may be coming up, may be up and down as we review and potentially replace other components.
- Work continues. Logs suggest either a bad disc or the controller; the (temporarily) restored machine will come down again shortly (along with the hosts that utilize storage that SAN host provides) to enable further (physical device) testing.
- We can now estimate around 65-75 minutes for the in-fact restoration of services, where after things should begin to stabilize as hosts come up.
- The array is recovering. Multiple discs were involved however no restoration from backups has thus far been required.
Christopher Howard said, "I just want to express my appreciation for these detailed, timely outage updates over Mastodon. I always check here first for good info."
"It is not not always possible in these high-stress times. We try," the FSF said. "Thank you for commenting."
At the moment the GNU Web site/s and FSF site seem to be back online. █
