Summary: An analysis of the causes that led to the Deepwater Horizon blowup (or what failed to prevent it), based on the long inquiry
THE previous post spoke about Stuxnet, which endangers many people whose company/authority/personal computer runs Microsoft Windows. Another recent disaster where Windows got some blame was the Deepwater Horizon blowup [1, 2, 3, 4]. An anonymous Techrights contributor wrote an update on the topic — one which we publish below.
“Here’s an update on the Deepwater Horizon story, “he writes, “New testimony spurred me to look up transcripts that had not been published at the time. There were several bombshells worth sharing and thinking about. For example, Windows NT is named and shamed by the expert witness. Windows was not mentioned in most press coverage but it seems to have played a more central roll than even I expected.”
Here is the report in question
Windows NT and the Deepwater Horizon
A buggy control system left drillers and the rig blind and might even have damaged a critical safety system on the sea floor.
Microsoft Windows may have been directly responsible for Deepwater Horizon catastrophe. Previously, Techrights showed that Microsoft Windows played a crucial role. A 824 page transcript from the July 23 Deepwater Horizon investigation has been posted and we can see that things were as Techrights guessed. Mr. Williams describes Windows NT, “a very unstable platform” as the root cause of most problems. This buggy Windows based control system left drillers blind when it crashed daily was responsible for safety system bypasses and may have destroyed the annular seal. New testimony from Andrea Fleytas, who operated the alarm panels on the doomed bridge and jumped from the flaming deck with Mr. Williams, shows that the drilling team may have had time to escape if the alarms were not inhibited. This interpretation of her testimony, with some quotes, was published by the Times Picayune. The consequences of this disaster and ongoing cover up are well reported in the Florida Oil Spill Law blog.
Mr. Williams describes typical Windows problems in three identical, malfunctioning control systems, A Chair, B Chair and C Chair, on pages 42 and 101. There’s incompatibility, instability, harmful bugs and worries about viruses. On page 42, Mr. Williams talks about the systems, their importance and how broken they were.
The A-chair is located in the dog house. That is the main operating point for the driller to control all drilling functions. It controls everything from mud pumps to top drive, hydraulics. It controls everything.
For three to four months we’ve had problems with this computer simply locking up. [sometimes it was a blue screen, sometimes a frozen display] … We had ordered replacement hard drives from the manufacturer. We had actually ordered an entire new system, new computers, new servers, new everything to upgrade it from the very obsolete operating system that it was using. Those computers were actually using Windows NT, which is a very unstable platform to begin with.
Between the manufacturer and the rig, they could not get the bugs worked out of the new operating system. They couldn’t get the old software to run correctly on the new operating system. Our sister rig, the NAUTILUS, was going through those growing pains kind of for us. We had already ordered all the equipment. We were just waiting on them to figure it all out so that we could copy their learnings and make it work on our rig.
Meanwhile, we were limping along with what we had. We had ordered new hard drives. They came in. We replaced the images on the hard drives for the software imaging, got them back running, the chair would run for two, three days, and they would crash again. … I can’t tell you how many hours or days he [electrical supervisor, Tommy Daniels] spent focused entirely on getting these chairs resolved. … He was still working towards that up until the time of the explosion. It had not been resolved.
In the same discussion, Mr. Williams attributes the blowout to the failure of this system by referencing a previous incident.
[in another accident] It was internally discussed that the chair crashing caused the kick, because they lost all — They lost all communications to the drill package. They had no way to monitor anything for several seconds, and before they could get the B Chair up, they had taken a kick.
On pages 103 and 104, he also describes how a “blue screen of death” could lead to a “kick” while waiting for the backup system to boot and be informed by “servers”. Operators complained about this loss of control every day and it happened at all hours of the day and night.
It should be noted that the problem with the alarms was not the sensors but it could have been viruses. Mr. Williams describes how he made sure all of those were working properly on pages 66 and 68 to 70. On page 77, Williams says, “The chairs themselves were completely independent and isolated from the entire rig network, so there was no chance of infection, virus, hacking, there was no opportunity for that.” This tells us that the rest of the network had problems that might have been carried to the control system via physical media, like USB drives or floppies.
Non free software left BP engineers in the field divided and helpless. On page 102, Mr. Williams tells us, “There was no fixing bad software. We could simply manage it, try to keep it running.” So, BP’s management was told that all they could do was as the vendor says. Money and resources were being spent to fix the problems but they were wasted. When the vendor’s software failed, BP was stuck begging for more from a system that had to be bypassed.
Mr. Williams describes the general alarm, its inhibition and consequences starting on page 30. The whole rig was blind to real danger.
You have four states of alarms. You have a normal operating condition, you have an inhibited condition, which simply means that the sensory is active, it is sensing, and it will alarm and it will give the information to the computer but the computer will not trigger an alarm for it. It will give you the indication, but it won’t trigger the actual alarm. [other states described] …
there are several toxic and combustible gas sensors located in key areas, mainly around the drilling package. … When you get two detectors to go into a high state in one zone, what is supposed to happen is the ESD for that zone should trip, which is your emergency shutdowns [designed to prevent explosions], and you should also sound the generator alarm.
The general alarm is set up to inform the entire rig of any of three conditions. … Each one of those conditions has a distinct tone and a distinct visual light. We have light columns throughout the rig. One red — Within the column there’s a red, a yellow, and a blue, with the red being fire, yellow being toxic, blue being combustible. So you get an audio tone and a visual tone with every general alarm. [none of these were used in the accident because the computer was set so general alarms had to be triggered manually. As we will see, they failed to do this.]
… When I discovered it was inhibited about a year ago, I inquired as to why it was inhibited, and the explanation I got was that they — from the OIM down, they did not want people woke up at 3:00 o’clock in the morning due to false alarms.
On pages 40 and 41 we see that Emergency Shut Downs had been set to bypass because the system shut panels down frequently over false alarms. This left everyone at risk of explosion.
On page 37, Mr. Williams drops another bombshell, that the same system may have destroyed the blow out preventer without human input. A reasonable system would inhibit motion, even human directed motion, that would destroy itself. What they had left them wondering about everything.
it took me a few days to understand or to formulate why we were getting chunks of [annular] rubber back. There was an incident prior to that where we were in testing mode and the annular was closed around the drill pipe. I got a call from the night-time toolpusher to come investigate whether or not there was an input to the stick to hoist the block while the annular was closed, and I inquired as to why he needed to know that. He said, “Well, the block moved about 15 or 20 feet. We need to know why. We need to know if it was inadvertent stick movement or if it went up by itself.” [an informal investigation] got into the chair log data and dissected the data. What we determined was one of the sticks was moved in the positive direction. What we could not definitively determine was which stick. The tag system inside the log was not accurate enough. It simply said, “Joystick A, Joystick B,” …
All the logs prove to me is that the computer thought someone pushed the joystick. The signal was erroneous and might also have been spurious.
The most dreadful immediate consequence of all of this was that eleven men died in an explosion and fire. New testimony shows a situation that a more reasonable system should have been able to react to and save the day. The blow out preventer should never have been damaged. Alarms should have sounded, so people could escape. Panels and generator should have been shut down to prevent an explosion. What actually happened? David Hammer of the Times Picayune tells us.
Andrea Fleytas said she felt the rig jolt that evening and saw more than 10 magenta lights flash on her screen notifying her that the highest level of combustible gas had entered the rig’s shaker house and drill shack, critical areas where the rig’s drilling team was at work. … she was trained to sound a general alarm any time more than one indicator light flashed, but didn’t do so immediately in this case because she had never been trained to deal with such an overwhelming number of warnings. … she eventually “went over and hit the alarms” after the first or two large explosions.
[before pushing the alarms] Fleytas received a telephone call from crew members on the drill floor who said they were fighting a kick of gas and oil in the well; she took another call from the engine control room asking what was happening and she told them they were having a well control problem; and she continued to hit buttons on her console acknowledging the multiple gas alarms popping up in various sectors of the rig. … A few seconds after she got off the telephone with the engine room, there was a blackout on the rig. A few seconds after that, the first explosion rang out, Fleytas testified. It was then that she sounded the general alarm.
Keplinger said in his own testimony that it was after the explosion when he first “noticed a lot of gas in there and called” the shaker house to try to get whoever may have been there out, but nobody answered the phone.
Fleytas said she knew of no protocols for activating the emergency shutdown and no one activated it. Gas likely ignited in the drilling area, killing everyone there, and also caused the two active engines to rev so high that all power on the rig was lost, preventing fire pumps from working and keeping the rig from moving away from the spewing well.
Microsoft failure did not end when the rig sank. Those trying to fix things were also burdened with second rate software.
Since then, people from Texas to Florida have been sickened and harmed by the spill. Toxic levels of dispersant have shown up in people’s private pools, the beaches are contaminated with about 200 ppm of oil, oysters, crabs and shrimp have even more. The oil made its way into people’s blood. If the big spill in Mexico is a guide, the spill will linger for decades . █