A $233 Million Whoops: The Story of the Ill-Fated NOAA N-Prime Satellite
We’ve all had those moments at work where a simple mistake snowballs into a bigger problem. But for Lockheed Martin in 2003, a seemingly minor oversight during a routine procedure resulted in a major disaster – and a hefty price tag – involving the NOAA N-Prime weather satellite.
A $233 Million Mistake
The NOAA N-Prime, valued at a staggering $233 million, was under construction at Lockheed Martin’s California facility. During a critical maneuver, the massive satellite was being rotated from a vertical to horizontal position on a special cart. Here’s where things went wrong: technicians had mistakenly removed bolts securing an adapter plate that held the satellite onto the cart. (The “turn over cart” is a piece of mechanical ground equipment on which a NOAA satellite can be rotated as much as 360 degrees and tilted as much as 90 degrees.) Although no injuries to personnel occurred, extensive hardware damage was sustained to the structure and many of the satellite bus and instrument components. Without these crucial bolts, the N-Prime wasn’t properly secured.
A Crash Landing
As the satellite reached a certain angle during rotation, disaster struck. The N-Prime toppled off the cart and crashed a full meter down onto the concrete floor. Thankfully, there were no injuries, but the delicate technology within the satellite wasn’t so lucky. The fall caused significant damage to the N-Prime (later known as NOAA-19).
The Costly Aftermath
Fixing the N-Prime wasn’t cheap. Lockheed Martin had to forfeit all profits from the project just to cover a portion of the repairs. The rest of the bill, a staggering $135 million according to a NASA spokesperson at the time, fell on the shoulders of the US government. This incident served as a harsh reminder of the importance of following proper procedures, especially when dealing with multi-million dollar spacecraft.
15% of the N-Prime Needed Rebuilding
The fall resulted in severe damage, with at least 15% of the satellite needing complete replacement. Lockheed Martin, acknowledging their role in the incident, pledged to rebuild the satellite at cost, forgoing any future profits from the project.
Click Here for – Executive Summary
Report executive summary
On Saturday, September 6, 2003 during an operation at Lockheed Martin Space Systems Company (LMSSC) Sunnyvale that required repositioning the Television Infrared Observational Satellites (TIROS) National Oceanic and Atmospheric Administration (NOAA) N-Prime satellite from a vertical to a horizontal position, the satellite slipped from the Turn-Over Cart (TOC) and fell to the floor. The satellite sustained heavy damage, although no injuries to personnel occurred. The exact extent of the hardware damage is still being assessed.
The operation scheduled for that day was to shim the Microwave Humidity Sounder (MHS) instrument by removing and replacing the instrument. This operation required the spacecraft to be rotated and tilted to the horizontal position using the TOC. The spacecraft fell to the floor as it reached 13 degrees of tilt while being rotated. The reason was clear from inspection of the hardware: the satellite fell because the TOC adapter plate was not secured to the TOC with the required 24 bolts.
Three days after the mishap, on September 9, 2003, Dr. Ghassem Asrar, NASA Associate Administrator for Earth Science established the NOAA N-PRIME Mishap Investigation Board (MIB) in the public interest to gather information, conduct necessary analyses, and determine the facts of the mishap. To identify the root causes at work in the NOAA N-PRIME Mishap, the MIB undertook two approaches. The first was an extensive analysis of the sequence of events prior to and on the day of the mishap; the planned operational scenario vs. the actual execution; and the planning activities, including scheduling, crew assembly and test documentation preparation. The second approach was to utilize the Human Factors Analysis and Classification System (HFACS) (2000) to provide a comprehensive framework for identifying and analyzing human error. Evidence from a number of sources, including witness interviews, test and handling procedures, and project documents, were used to develop the accident scenarios and populate the HFACS model.
Click to view the Proximate Cause: The NOAA N-PRIME satellite fell because the LMSSC operations team failed to follow procedures to properly configure the TOC, such that the 24 bolts that were needed to secure the TOC adapter plate to the TOC were not installed.
The root causes are summarized below along the four levels of active or latent failures as ascribed by the HFACS framework.
The TOC adapter plate was not secured to the TOC because the LMSSC operations team failed to execute their satellite handling procedures.
The Responsible Test Engineer (RTE) did not “assure” the turnover cart configuration through physical and visual verification as required by the procedures but rather through an examination of paperwork from a prior operation. Had he followed the procedures, the unbolted TOC adapter plate would have been discovered and the mishap averted. Errors were also made by other team members, who were narrowly focused on their individual tasks and did not notice or consider the state of the hardware or the operation outside of those tasks. The Technician Supervisor even commented that there were empty bolt holes, the rest of the team and the RTE in particular dismissed the comment and did not pursue the issue further. Finally, the lead technician and the Product Assurance (PA) inspector committed violations in signing off the TOC verification procedure step without personally conducting or witnessing the operation. The MIB found such violations were routinely practiced.
The LMSSC operations team’s lack of discipline in following procedures evolved from complacent attitudes toward routine spacecraft handling, poor communication and coordination among operations team, and poorly written or modified procedures.
It is apparent to the MIB that complacency impaired the team directly performing the operation and those providing supervision or oversight to this team. The operation was consistently characterized as routine and low risk, even though it involved moving the spacecraft. Several other adverse mental states, including fatigue and external constraints that limited the availability of portions of the crew to a half day, also may have had roles in the mishap. Incomplete coordination concerning ground equipment use and status, and late notification of operation schedules exacerbated the lack of rigor in handling operations. Standard operating procedures contained ambiguous terminology (e.g., “assure”) and can be significantly modified using redlines for unique (one time only) operations. These practices were the preconditions or latent failures that promoted the mishap occurrence.
The preconditions within integration and test (I&T) operations described above existed because of unsafe supervision practices within the LMSSC project organization, including ad hoc planning of operations, inadequate oversight, failure to correct known problems, and supervisory violations.
The RTE and I&T manager failed to provide adequate supervision and repeatedly violated procedures when directing and monitoring their operations crews. Waiving of safety presence, late notification of government inspectors, poor test documentation, and misuse of procedure redlines were routinely permitted. Further, the MIB believes that planning for the lift/turnover operation was hurried and resulted in a hastily formed operations team. Although all team members were experienced and competent, this atypical mix of authority among the various roles created dynamics that were not conducive to open discussion and shared responsibility. The MIB concludes that the lack of enforcement and support by the supervisory chain concerning the roles and responsibilities of the operation team members and the hurried planning for this operation are factors in this mishap.
The unsafe supervision practices within the TIROS program had their roots in the LMSSC organization: the inadequate resources and emphasis provided for safety and quality assurance functions; the unhealthy mix of a dynamic I&T climate with a well-established program and routine operations; and the lack of standard, effective process guidelines and safeguards for operations all negatively influenced the project team and activities.
The MIB finds the LMSSC system safety program to be very ineffective. Few resources are allocated to system safety, few requirements for safety oversight exist and little programmatic supervision was provided for the safety representatives. The I&T environment within the TIROS program is engendered by routine operations for which schedules and specific activities are frequently optimized. Such an environment requires rigorous oversight and processes to prevent overconfidence and complacency. The MIB believes that LMSSC failed to provide the organizational safeguards to prevent this and other potential mishaps, especially in key areas that regulate operational tempo, operations planning, procedure development, use of redlines, and Ground Support Equipment (GSE) configurations.
The in-plant government representation, Defense Contract Management Agency (DCMA), and the GSFC Quality Assurance (QA)/safety function failed to provide adequate oversight to identify and correct deficiencies in LMSSC operational processes, and thus failed to address or prevent the conditions that allowed the mishap to occur.
The in-house Government Quality Assurance Representative (QAR) (acting as a DCMA agent) inappropriately waived a Mandatory Inspection Point during the Saturday morning operation. Although his presence may not have prevented the mishap, the MIB believes this waiver is indicative of a failed oversight process and barrier. The MIB finds that the government quality assurance and safety oversight at GSFC were also deficient, having become issue driven due to the maturity of the project. Once issues were brought to their attention, the QA/safety personnel worked their resolution but there was very little proactive oversight, audit, inspection, etc. of the LMSSC operations. The in-house Government QAR knew of some of the problems associated with procedure discipline and safety and program assurance oversight but did not communicate them to the NASA project. Given the prevalence of some of the contractor deficiencies identified in this investigation, however, it is the MIB’s assessment that the government in-plant representative, DCMA, and the GSFC QA/Safety function should have identified and demanded correction for these deficiencies.
The Government’s inability to identify and correct deficiencies in the TIROS operations and LMSSC oversight processes were due to inadequate resource management, an unhealthy organizational climate, and the lack of effective oversight processes.
Relative to resource management, the GSFC project, in working to deal with a declining workload and resources, allowed and even encouraged trade-offs between the schedules, staffing and milestones for the two remaining satellites in the Polar Operational Environmental Satellite (POES)/(TIROS) project. These constant and rapid trade-offs exacerbated the already fast operational tempo of the LMSSC I&T team. Organizational climate was found to be an issue, primarily in the government on-site structure. There is no Project in-plant civil servant government presence. The Project in-plant government representatives (one in quality assurance, two in I&T) were past employees of LMSSC and were hired as outside contractors by the GSFC Project. The MIB believes that their past associations with the company might precipitate undue complacency due to familiarity. Although the POES Project and the contractor track and trend closure of contractor generated Non-Conformance Reports (NCRs) for timeliness, there is no process in place to analyze and trend NCRs for cause and to identify systemic problems. The MIB found no effective process in place to follow up on closure of Defense Contractor Management Agency (DCMA) generated Corrective Action Requests (CARs). Supplier Assurance Contract (SAC) generated audit deficiencies, and action items from an external review (TIROS Anomaly Review). Likewise lacking is the government organizational oversight to monitor, verify, and audit the performance and effectiveness of the I&T processes and activities.
The MIB found the DCMA CAR assessment and reporting process and other DCMA audit processes to be deficient in identifying troubling trends in the LMSSC facility. Review of CARs indicates repeated requirement violations and bypassing of Mandatory Inspection Points by the contractor. The DCMA Technical Assessment Group (TAG) facility audits, the DCMA annual safety audits, and the DCMA facility summary reports of CARs prior to the mishap, however, all indicated a healthy facility environment, with no noteworthy problems reported. MIB recommendations to correct the findings/deficiencies above are provided in section 8 – Recommendations.
It is the MIB’s assessment that many of the findings uncovered in this mishap investigation are not specific to this mishap but are systemic in nature. A separate follow-up investigation should be conducted to further examine and characterize these systemic problems.
Delayed Launch, Lasting Legacy
The incident significantly delayed the launch of the N-Prime, originally planned for December 2007. It eventually lifted off in February 2009, marking the final weather satellite in the NOAA series.
The story of the NOAA N-Prime serves as a cautionary tale, highlighting the importance of clear communication and meticulous attention to detail in large-scale projects. It’s a reminder that even a seemingly minor oversight can have massive consequences.
I think the Russians should read this. With all the issues they have with meteorological satellites, of all kinds they launch, this could be a good lesson for them. If you look at the years-long delays in the Meteor series alone…and then look at the preparation for launch pictures. At NASA, everyone is masked, gloved and wearing dust free suits in a clean room. Look at Roscosmos. No clean room, techs wearing jeans and tee shirts or dirty work overhauls, and in one picture you can see a pack of cigarettes in the guys pocket!!
Oh, dear.