Back

The Seal Failure in the SRB That Doomed Challenger (2021)

140 points3 monthsexrocketman.blogspot.com
Paul-Craft3 months ago

Challenger did not explode because of a "seal failure." That tragedy was entirely preventable. At least one of the engineers raised the alarm, saying that because of the recent cold weather, they couldn't guarantee that the O-ring would perform properly. But, the big wigs disregarded that warning, and seven people paid for that mistake with their lives.

No, it most certainly was not a seal that failed. It was an organization that failed. Unfortunately, it's harder to fix an organization than it is to design an O-ring that won't become brittle from sitting out a few hours on a cold night.

dghlsakjg3 months ago

That's my takeaway after a lot of reading about this.

The seal performed exactly as it was specified to. The spec was that below certain temperatures it wasn't guaranteed to perform as a seal, and that's exactly how it performed.

The issue was that some level of management was alerted that they were operating outside of the spec, and they gambled that it didn't matter.

KennyBlanken3 months ago

The seal did not "perform exactly as it was designed to."

The seal leaked during initial static pressure testing, it leaked during test-firings, etc. Engineers and management at both Thiokol and NASA knew about this. NASA engineers repeatedly objected to the Thiokol design (both original and modified) for different reasons, talked to the o-ring supplier who stated that the design was using o-rings in a way never used before, etc.

The author of the blog post is wildly wrong, but so are a lot of comments.

Everyone, PLEASE read the Rogers Commission report. It spells out the extensive problems with the design/manufacture, and at both Thiokol and NASA.

https://www.nasa.gov/history/rogersrep/v1ch6.htm

Edit: I can't post a response because my account has a posting limit, but "the o-rings were not defective" is...misleading.

The o-rings were constructed as an assembly that used multiple lengths of o-ring material glued together, instead of the entire o-ring being molded at once, which is what had been done on prior rockets. Up to five joints were allowed. No inspection of the glued joints was performed other than a surface inspection.

Second edit: no, the problem is not that the "selection team did not account for atmospheric pressure." The joints between sections mechanically did not hold together correctly. NASA engineers predicted this when examining the revised design during the earliest phases, although the assembly was found to act in a way different than how they had predicted, but still caused the seals to leak. Everyone knew the seals leaked, before the first shuttle headed to the launch pad.

Third edit: the blog is "wildly wrong" because it claims NASA supplied or modified the revised seal design and outright declares them incompetent government bureaucrats who didn't know how solid rocket motors worked. In fact, both designs came from Thiokol in entirety - and NASA engineers basically said in reports something to the effect of "the government (ie NASA and the military) has never seen a solid rocket motor sealed like this".

When NASA engineers approached the o-ring manufacturing company, the company said they'd never seen a design like it and felt that it was 'not being used like an o-ring' or something to that effect.

From the very beginning NASA engineers were screaming their heads off that the design was shit. Testing validated their concerns. Upper management at both Thiokol and NASA didn't care.

dghlsakjg3 months ago

I should have been more specific: the o-ring did not fail in an unexpected way.

There was no defect in the o-ring. The design of the entire joint it was sealing was suspect, and was known to perform in a way that was not satisfactory. It did perform just as it was expected to (they expected a failure under the conditions) by the people that had the technical details.

+1
procflora3 months ago
stracer3 months ago

Why is the blog post wildly wrong? He states the seal joints were poorly designed, similar to your statement. His statements seem reasonable and in agreement with Feynman's and others' (Boisjoly) account, which tell a story of corrupt NASA and Thiokol management, who pushed for flying outside the safe temperature window.

romwell3 months ago

>The seal performed exactly as it was specified to.

Yeah, no.

The seal never performed the way it was designed to.[1] It was faulty by design. The seal always leaked, but that didn't always lead to an explosion.

There were multiple times where the erosion/blow-by problem was observed; in fact, the O-ring was (unsuccessfully) redesigned by Morton Thiokol to address the issue.[2]

The problem was that nobody really understood what was going on, and waved their hands about it.

Quote[3]:

“NASA had developed a peculiar kind of attitude: if one of the seals leaks a little and the flight is successful, the problem isn’t so serious. Try playing Russian roulette that way: you pull the trigger and the gun doesn’t go off, so it must be safe to pull the trigger again.”

Please do some due diligence before simply saying things that feel right for the sake of making a point.

Your point still stands (it was an organizational failure), but premising it on a false statement (that the O-ring performed to spec) isn't a way to make it.

[1] https://www.latimes.com/archives/la-xpm-1986-11-19-mn-4295-s...

[2] https://www.nasa.gov/history/rogersrep/v1ch6.htm

[3] https://lithub.com/how-legendary-physicist-richard-feynman-h...

dylan6043 months ago

> “NASA had developed a peculiar kind of attitude: if one of the seals leaks a little and the flight is successful, the problem isn’t so serious.

this quote is also feels pertinent to the Starliner decision to launch. They knew there was helium, but they just decided there was more helium for the mission than was leaking. so the acceptable risk bar seems to be pretty low.

bumby3 months ago

It's prevalent all over the aerospace industry. It's also why the Columbia was lost (foam shedding happens but it's "in family" ie - it's a known issue that hasn't prevented a flight from being successful).

Complicated systems are always having some issues that aren't to spec. The difficulty is assigning an appropriate risk to them.

I've heard people who design 5+ sigma aircraft who refuse to fly on it because they know of some system that isn't up to spec in their eyes; in that case, the opposite is true: they're assigning a risk that is probably too large given the data.

dghlsakjg3 months ago

You wrote this after my other reply clarifying my point to a sibling comment that addresses exactly what you are saying, so please go back and read that.

The gist of it is that the failure was expected by everyone with technical knowledge of the seal.

romwell3 months ago

Maybe read what you wrote instead? The words, not the gist.

>The gist of it is that the failure was expected by everyone with technical knowledge of the seal.

Nobody is denying that.

What you wrote was, quote: "The seal performed exactly as it was specified to."

That statement is false. It was not performing as specified at warmer temperatures either.

m3kw93 months ago

The selection team for the seal failed to account for temperature ranges for earth atmospheric conditions. They went low and that is what happened

sjm-lbm3 months ago

To be fair, there were specific temperature ranges in which the shuttle was supposed to be capable of launching, and the temperature range did not go as low as you might expect because the Shuttle was going to launch in Florida (and, maybe someday, California) - iirc, the minimum temperature was 40 def F or something like that.

Of course, the Thiokol engineers weren't sure that 40 was sufficient (they were worried about anything below 52 degrees), but in defense of the people that chose the o-ring material, Challenger launched outside of the design spec for the Space Shuttle.

dhc023 months ago

Your initial statement is a particularly bad sort of semantic mishandling.

The seal absolutely failed, in the way that any reasonable person would interpret that phrase. A seal prevents things from getting past it, and it did not do that.

Some alternate phrasings that I believe make the (valid) point without this semantic flaw:

- The Challenger did not explode because of a simple or unexpected seal failure.

- The seal failure was merely a symptom of a larger, harder to fix, much more troubling failure.

- The seal failed, but that was not the failure that mattered.

bumby3 months ago

>couldn't guarantee that the O-ring would perform properly.

Small nit, but from a reliability perspective you can almost never guarantee something will work. It's a probability distribution but, to your point, you need decent data to estimate that probability.

The other issue is a psychological one. Humans aren't generally wired to think about probabilities well, especially low-probability events. There's a fairly decent chance you can roll the dice over and over on these types of decisions and not have anything bad happen, leading you to think you're good when in reality you're just lucky.

vikingerik3 months ago

However, do keep in mind the principle you need to think about when designing such an organization, which is warning fatigue.

Anything as complex as the shuttle is going to have any number of groups raising any number of warnings about any number of components. If you heeded all of them, you'd spend eternity investigating everything and nothing would ever fly (welcome to the SLS.) How many other warnings of similar perceived severity were ever raised but flew anyway and never resulted in anything catastrophic? Probably a lot. "Go fever" is a problem, but so is its opposite in warning fatigue.

It's easy to condemn the organization, but that has to also come with some sense that the organization had a million problems to deal with, and we only knew to pick out this one after it happened.

pixl973 months ago

Eh, this is why the space shuttle sucked and was a stupid design that was going to kill people. It could never fly without people so every possible flight test was a fatal one. It was too expensive so no one wanted to 'test' it. It was such an incredibly complicated package that critical issues that would ground any other project got bypassed because of the sunk cost fallacy.

SpaceX generally does the opposite of this for example. Their testing is very hardware rich. Then they'll have dozens of automated flights on very similar hardware. Then after the hardware has been used in a wide range of conditions we see it migrate to human ratings.

NASA picked something too complicated, was warned it was too complicated, then lost human lives when it was too complicated.

shiroiushi3 months ago

As I understand it, NASA basically was forced down that path, by the military. The military wanted a spacecraft that could take secret spy satellites to orbit, and also be able to retrieve them and bring them intact back to Earth. That forced the Shuttle design.

fuzzfactor3 months ago

There had been delays and as a live TV watcher it could be seen there was pressure to launch as soon as conditions could be considered the least bit acceptable.

IOW almost completely unfavorable, but not quite as bad as it was earlier.

The media revealed some skepticism that a launch would be advisable that day, but once a lift-off time was set, then it became all systems go as usual.

If there's a freezing day on that part of the Florida coast, that's an unusual year. You question everything that hasn't previously survived that kind of year in the past.

I didn't feel optimistic for those reasons alone.

>The “science” is that knowledge which was written down. The “art” is the knowledge that was not written down, usually because no one wanted to pay for the writing.

That's so true why so much documentation is never made, you have to make the most of what you have and fill in the rest through experimentation.

But O-rings are so boring, seems like nobody wants to take the time to even read the "free" literature. It's not any more complex than an average engineering semester.

Now it would take more than a semester to get really deep into polymer properties that can be involved under different conditions, but engineers themselves are never expected to get very far in that direction if they're not even experts in the mechanical engineering of the o-ring dimensional enclosures.

Once it was revealed that the Challenger was doomed by inadequate o-ring engineering, it reminded me of the day one blew out when I had my first gas lab, at over 10,000 psi the explosive force was easily noticeable. From quite a few doors down.

The engineer who designed the cylinders had "copied them from XXXX lab" and we were using them no differently than they were doing, but it was always an accident waiting to happen because the tolerances and material selection were not given but a fraction of the attention necessary to avoid a mission-critical failure, much less a potentially hazardous aftermath. Quintessential technical debt.

Anyway I had to redesign the cylinders and deal with the machinists and suppliers myself. I guess maybe it was a bit like artistic background that was helpful since I had already worked in one of the highest-precision machine shops during summer, and then after university full-time (12-hour days) at a polymer plant laboratory. What it brought to the table was greater than the available documentation which was essential too. Before it was over I had then spent time with a life-long o-ring expert who had built a company based on o-rings for severe service. There is no substitute for a large warehouse filled with nothing but millions of o-rings, and browsing around with the pro who has helpful advice at every turn and truly wants you to never have a blowout as if your life depended on it.

So the o-ring blowout had been my initiation into a commercial laboratory startup back in 1980, building custom engineering laboratories to handle contract research projects.

Definitely a single point of failure which is worse than others because it is so boring, it is much more likely to be overlooked.

Things I won't work with: engineers that are not so great but think they are.

Much better off handling things like benzene, methanol, or sulfuric acid in shorts and flip-flops which people know not to try this at home.

cryptonector3 months ago

TFA explains that the seal design was flawed. The cold helped, but the design had already failed in earlier flights, and that had been observed. It really was a seal failure.

hilbert423 months ago

"...the crew did not die in the tank explosion and subsequent ripping-apart of the orbiter by air loads. ...The crew was still alive in the orbiter cabin until it finally hit the sea, which is about a 200-gee stop, since it hit dead broadside."

Anyone around at the time vividly remembers this horrible tragedy. My memory was unexpectedly reinforced only days later when I came across a memorial to the crew in the Smithsonian museum.

Perhaps those in NASA were aware that the crew were (or would have assumed to have been) alive until they hit the water but if I recall that knowledge wasn't available to the GP.

I assumed, like I suppose many, the crew were killed outright at the time of disintegration and that would have been the most merciful outcome. That it wasn't even now fills me with horror and I shudder to think about it. The crew's final moments must have been sheer terror.

gwd3 months ago

> That it wasn't even now fills me with horror and I shudder to think about it. The crew's final moments must have been sheer terror.

Having been in a situation where I thought I was about to die (head-on collision on an icy country highway), that wasn't my experience. When I saw the other car in my lane I was hit with a wall of adrenaline; I experienced the "bullet time" that you see sometimes in movies, where tiiime slooowwws doooowwwwnnn; and I felt like I was processing things calmly and rationally, thinking about what they tell you to do in that situation in driver's training, and even when it became clear that there was nothing I could do to avoid the oncoming car, I simply thought, "I wonder what it's like to die?"

(Now having been in a few more accidents, and experiencing "crumple zones", I'd probably just be thinking about whether the car would be totaled.)

The professional astronauts will have been put through loads of training for all kinds of contingencies; there's a very good chance they had similar responses, and a decent chance that McAuliffe did too.

MadnessASAP3 months ago

They did, many of the controls in the recovered cabin were found in positions that correspond with the crew following emergency procedures. Obviously with the rest of the orbiter missing those procedures were useless, as would've been anything else they attempted.

Which makes it little more then a sad footnote to the whole thing, there was nothing the crew could've done, but they died knowing they were in big trouble and were trying to restore control of a scrap of their former vehicle.

KineticLensman3 months ago

> I assumed, like I suppose many, the crew were killed outright at the time of disintegration and that would have been the most merciful outcome

The breakup was certainly slower and less directly destructive for the crew than the Columbia. Challenger essentially broke up due to aerodynamic stresses as the entire stack tumbled when the SRB broke loose (the stack rotated so the orbiter was on top, at Mach 1.92). The massive visible 'explosion' was actually the fuel from the external tank igniting as a result of the tank breaking up mechanically, not the initial cause of the orbiter's destruction. The separated but mainly intact crew compartment then continued upwards ballistically before beginning its long fall back to the ocean.

In contrast, due to earlier launch-time damage to its wing, Columbia entered a flat spin while re-entering hypersonically at Mach 15. Complete break-up was about 20 seconds after the last comms from the crew. By this time the crew were already dead, due to physical trauma from being violently buffeted around - e.g. their non-conforming 'fishbowl' helmets offered no real head-protection in the event of violent movement.

hilbert423 months ago

Thanks for that. I presume on both accounts NASA is much more conversant with the details than are publicly available. That's how it ought to be out of respect for the families.

KineticLensman3 months ago

> I presume on both accounts NASA is much more conversant with the details than are publicly available. That's how it ought to be out of respect for the families.

Yes, and I agree. The Columbia break-up left fragments scattered over a massive area and from the fact that parts of spacesuits, belts, seats etc are scattered sometimes miles apart, it's clear that the breakup was massively destructive. But in the otherwise very detailed reports, the 'medical' details are (rightly, I think) redacted, as you say, out of respect.

dghlsakjg3 months ago

The crew were likely alive, but not conscious.

The crew capsule experienced high g force on multiple axis' pretty quickly after the explosion.

That's enough force to put even the best fighter pilots to sleep VERY quickly.

Nasa also concluded that the emergency air system wouldn't have been sufficient to maintain consciousness with a complete loss of cabin pressure at their altitude. (https://www.nasa.gov/missions/space-shuttle/sts-51l/challeng...).

The emergency air system was designed for an evacuation on the ground, not at altitude.

mihaaly3 months ago

The overuse of emphasization (heavy phrases, all caps, underline, bold, italic, exclamations, COMBINED!! :) ) for the sometimes later coming clear statement of facts (not needing emphasis because speak for themselves) is an irritating read. Aggrevated by tangential updates in prime location of the start (instead of end) derailing attention right before started. Educationally the style is very obstuctive. But pretty useful writing still after pushing ourselves through, even after decades of thousand articles into this topic.

romwell3 months ago

>The overuse of emphasization (heavy phrases, all caps, underline, bold, italic, exclamations, COMBINED!! :) ) for the sometimes later coming clear statement of facts (not needing emphasis because speak for themselves) is an irritating read

Some of us are neurodivergent, and this flow both makes it easier to follow, as well as more closely reflects the way we think (and write).

You can enable your browser's reader mode to remove formatting.

We can't hit a button to add emphasis in relevant places.

Consider this next time you're tempted to comment on formatting (including that of this comment).

mihaaly3 months ago

Assisting some is nice but not on the expense of others, this is not the way.

If this article was solely for the neurodivergent then disregard my comment and I turn away for good, but if not then can you please point at the button in reader mode that removes heavy phrases and all caps and combined bold+italic+underline and put the update to the end of the article please? Thank you!

fgd1353 months ago

Reader mode does not remove the formatting, at least on firefox

vrinsd3 months ago

Anyone who wants a genuinely detailed treatment of this subject should read Allan McDonald's book "True, Lies O-rings" *. I happened to have finished this a few weeks ago and it goes on my list of all "engineers should read this".

This was really one of the most fascinating books I've read and likely the most definitive treatment of the subject by a subject matter expert. I kind of skimmed the blog article, the book explains in critical detail the issues with the original design and why the re-design (done after the disaster) was a much more robust approach.

In a nutshell the Shuttle SRB field-joint design was taken from a Titan missle design that was deemed to be "solid engineering" because none had blown up, but Allan mentions the SRB field-joint was flawed from the start and the joints suffered rotation and physically moved / flexed. (Later, it turns out a Titan missle exploded and the teardown showed the o-rings a primary point of failure).

Allan mentions it was the blowby past the o-rings that was consistently the issue and the engineers wanted to understand and address this problem for a long time.

What was striking to me, beyond the technical aspects of making these things work is the actual cover-up and attempt on NASA+Thyokol to blame McDonald and others for the resulting disaster. I knew of some parts of this, but you don't realize how messed up the situation was/is until you read the book.

Personally I'd ignore any negative reviews of the book, I think non-engineers, especially those who haven't worked in an Aerospace/Defense environment or in a big company might think Allan is arrogant or boasting, but he starts by providing the foundation for his statements before getting into the details which is a classic "engineer's engineer" way of thinking.

* https://www.goodreads.com/author/show/2101296.Allan_J_McDona...

bayouborne3 months ago

quick typo correction, 'Truth, Lies, and O-Rings'

cryptonector3 months ago

This is the best explanation I've seen yet.

The summary is that you want a buffer of air between the hot gasses of the running motor and the o-ring seal such tat the air gets compressed but remains between the hot gasses and the o-ring thus insulating the o-ring from the hot gasses. But to avoid a pressurization test NASA went with a two-o-ring scheme where the space between them is pressurized, which forces the inner o-ring to be on the wrong side of where it should be, thus leaving little or no air buffer between that o-ring and the hot gasses. That in turn can cause point failures in the inner o-ring which will result in concentrated jets of hot gasses impinging on the outer o-ring which then cannot hold (because the pressure isn't uniform across the o-ring's circumference, instead it's concentrated on a point). Add the cold o-ring brittleness and boom.

Is SLS still using this o-ring design?? I sure hope not.

KennyBlanken3 months ago

It was revised after the Challenger disaster.

It's not the "best explanation yet" - it's outright historical revisionism that directly contravenes reams of documentation, test results, and testimony as part of the Rogers Commission.

The author repeatedly claims NASA created the design of the inter-section seal, that they were incompetent in solid rocket motor design, insisted on the design, etc because Those Silly Government Workers blah blah.

In reality, NASA engineers saw Thiokol's initial bore/face seal design (which was also a double seal...) and objected to it:

"The initial Thiokol design proposal was changed before the production motors were manufactured. Originally, the joint seal design incorporated both a face seal and a bore seal.16 (Figure 1.) However, the motor that was eventually used had double bore O-rings. The original bore seal/face seal design was chosen because it was anticipated that it "provides [better] redundance over a double bore ring seal since each is controlled by different manufacturing tolerances, and each responds differently during joint assembly. " 17 Because the early design incorporated tolerances similar to the Titan and it also incorporated a face seal, Thiokol believed it possessed "complete, redundant seal capability."

> Nevertheless, as the Solid Rocket Motor program progressed, Thiokol-with NASA's concurrence-dropped the face/bore seal design for one using a double bore seal (Figure 1). NASA engineers at Marshall said the original design would have required tapered pins to maintain necessary tolerances and assure enough"squeeze" on the face-sealing O-ring.19 However, design analysis determined that motor ignition would create tension loads on the joint sufficient to cause the tapered pins to pop out. Solving that would have meant designing some type of pin-retainers. Moreover, the rocket assembly was much easier with the dual bore seals. Because inspections and tests had to be conducted on the Solid Rocket Motor stack, horizontal assembly was required.

The seal design was still not adequate and NASA engineers objected more. Testing showed NASA engineer's concerns to be valid. During both static pressure testing and actual test-firings there was documented leakage, and Thiokol's response was to revise what was considered an acceptable amount of leaking.

Numerous NASA engineering staff, and even the o-ring suppliers, all objected. Thiokol repeatedly ignored those objections, as did NASA management.

You can read it right here: https://www.nasa.gov/history/rogersrep/v1ch6.htm

The entire premise of the blog post is contravened by early tests. Whereas the author claims that the seal was doomed to fail because the o-ring would leak until it was forced upward to where it would properly seal, the actual problem was:

> Although the test was successful in that it demonstrated the case met strength requirements, test measurements showed that, contrary to design expectations, the joint [123] tang and inside clevis bent away from each other instead of toward each other and by doing so reduced-instead of increased-pressure on the 0-ring in the milliseconds after ignition.26 This phenomenon was called "joint rotation." Testifying before the Commission, Arnold Thompson, Thiokol's supervisor of structures, said,

> "We discovered that the joint was opening rather than closing as our original analysis had indicated, and in fact it was quite a bit. I think it was up to 52 onethousandths of an inch at that time, to the primary O-ring."27

> Thiokol reported these initial test findings to the NASA program office at Marshall. Thiokol engineers did not believe the test results really proved that "joint rotation" would cause significant problems,28 and scheduled no additional tests for the specific purpose of confirming or disproving the joint gap behavior.

csours3 months ago

It seems like their main complaint is with NASA management - though it did sound like it bled over onto engineering as well. That feels like a very natural human response - to paint with a wider brush than is needed.

yodelshady3 months ago

Good article, I've seen this covered from the materials science and system engineering perspective before but not the mechanical perspective.

Ask any first year materials science graduate how Challenger failed and they'll confidently tell you about glass transition temperatures in fluoropolymers, but if any chartered engineer gave you that answer, fire them. People in the room at the time knew about that, but somehow a clear warning became a point of uncertainty became a minor interest became a footnote.

What I find more interesting is, ask any first year economist about 2008 and they'll tell you about Gaussian risk cupolas. Somehow in that field sticking with the level one explanation as if the PhDs in the room there didn't know is accepted.

hi_dang_3 months ago

[flagged]

incorrecthorse3 months ago

> Those two flight deck pilots had breathed-up all the oxygen in their breathing packs by the time they hit the sea, something confirmed by the empty breathing packs that were recovered. Which means they were alive when they hit the sea!

I don't understand how this follows. The best scenario is that they had their last drops of oxygen around hitting the sea; in other scenarios they died from lack of oxygen before hitting the sea.

KineticLensman3 months ago

> The best scenario is that they had their last drops of oxygen around hitting the sea; in other scenarios they died from lack of oxygen before hitting the sea.

See [0] for a summary. It appears that at least one unidentified crew member activated the air pack for Smith (the pilot) but not Scobee (the commander). Smith operated some switches after the break-up so was certainly conscious. The crew compartment was tumbling but not so fast as to cause blackouts.

[0] https://en.wikipedia.org/wiki/Space_Shuttle_Challenger_disas...

16594470913 months ago

Here is the link to add on the Personal Egress Air Packs, and a crew member activating Smiths PEAP

[0]: https://en.wikipedia.org/wiki/Personal_Egress_Air_Pack

KennyBlanken3 months ago

Note that the Challenger crew were not wearing Launch Entry Suits like those shown in the photo.

They were dressed in what amounted to nylon jumpsuits and motorcycle helmets.

inglor_cz3 months ago

"in other scenarios they died from lack of oxygen before hitting the sea."

If they ran out in last 4 km of altitude or so, they would be in air dense enough not to even lose consciousness.

krisoft3 months ago

> they would be in air dense enough not to even lose consciousness.

Assuming that they don't need to do any action to change from bottle oxygen to external. Or that if action is required (like turning a valve or opening their visors), that it was performed by them.

I do not know how that subsystem worked. Maybe someone else here knows?

simple103 months ago

In my college statistics class, we learned MatLab and R using data from the Challenger to recreate why the engineers thought the o-ring might fail and raised the warning. I don't remember the specifics of all the stats, but it was fascinating and the data is publicly available.

Here's a blog that shows some of the analysis (from google search): https://byuistats.github.io/Statistics-Notebook/Analyses/Log...

bumby3 months ago

I wonder what kind of effect that analysis would have on the launch decision. Here's how I would expect it to go:

Engineer: The logistic regression model shows there is a 99.96% chance of failure at 31deg

Manager: How good is the model?

Engineer: p=0.1157

Manager: So it's not statistically significant by our standards. In other words, it's not a very good model.

Engineer: Yeah, but still.

Manager: Well, what does the model say the temperature needs to be before we're more likely to have a successful launch than not?

Engineer: 64.8 degrees.

Manager: I can't defend pushing the schedule to the right that far based on a statistically insignificant model.

(It doesn't mean the manager was right, but I think it highlights hindsight bias. The model only seems good in hindsight because it aligns with the outcome we witnessed and not because it's a really good model. The managers at the time didn't have the benefit of hindsight.)

Ringz3 months ago

From the Wikipedia Article:

„Modified SR-71 Blackbird ejection seats and full pressure suits were used for the two-person crews on the first four Space Shuttle orbital test flights, but they were disabled and later removed for the operational flights.“[1]II-7

But I think this would not have helped the astronauts on the middle deck.

[1]: Jenkins, Dennis R. (2016). Space Shuttle: Developing an Icon – 1972–2013. Specialty Press. ISBN 978-1-58007-249-6.

sitharus3 months ago

Indeed it wouldn’t, which is why they were removed when they started putting astronauts in the middeck seats.

dboreham3 months ago

Interesting reading. Did not glean from multiple books and documentaries that they "added another layer of turtles" to the O-ring design.

lupusreal3 months ago

Good explaination. Too many people think the cold launch day and NASA culture that allowed a launch on such a day was the only issue. It's less widely known that the design was fucked from the start and wasn't working properly for any of the Shuttle launches.

pdonis3 months ago

Also, there was another very poor decision by NASA that the article does not mention. The summer before the Challenger launch, Thiokol, at the urgent request of its engineers, sent NASA a memo stating that the SRB O-rings were not sealing properly and recommending that all Shuttle flights be stopped until the issue was understood and fixed. NASA's response was to reclassify the SRB O-rings as a Criticality 1 flight risk instead of Criticality 1R--1R means the issue could cause loss of vehicle and loss of crew if it happens, but there is a redundant backup, whereas 1 means no redundant backup, which means the Shuttle should indeed have been grounded--and then waived the risk so the Shuttle could continue flying. So NASA not only put a very poor design into use, they kept on using it even after their own flight risk procedure told them they should stop.

KennyBlanken3 months ago

You're mixing up events. The reclassification happened in 1980 (correction: 1982, I can't math), ~4 years prior:

"A second major event regarding the joint seal occurred in the summer of 1982. As noted before, in 1977-78, Leon Ray had concluded that joint rotation caused the loss of the secondary O-ring as a backup seal. Because of May 1982 high pressure O-ring tests and tests of the new lightweight motor case, Marshall management [126] finally accepted the conclusion that the secondary O-ring was no longer functional after the joints rotated when the Solid Rocket Motor reached 40 percent of its maximum expected operating pressure. It obviously followed that the dual O-rings were not a completely redundant system, so the Criticality 1R had to be changed to Criticality 1.53 This was done at Marshall on December 17, 1982. The revised Critical Items List read (See pages 157 and 158)"

https://www.nasa.gov/history/rogersrep/v1ch6.htm

Also, the issues with the o-rings not sealing properly were known from the very beginning. Thiokol's seal design was extremely unorthordox, NASA engineers objected before the contract selection, testing showed substantial leakage during both static pressure testing and actual firings. NASA management ignored all this, and Thiokol insisted it wasn't a problem and re-wrote the pass/fail standards in terms of leakage.

There were numerous problems; the o-rings were glued instead of molded as they had been on the Titan, the boosters were assembled horizontally (something that had never been done before - certainly not on the largest solid rocket motor ever built), the o-ring assemblies were not inspected for voids...the list of incompetence just goes on and on.

Really, people: read the report.

pdonis3 months ago

> The reclassification happened in 1980

Actually, the reclassification happened in December 1982, according to the Rogers Commission report that you reference. (The original classification as 1R happened in 1980; the reclassification that removed the "R" happened in 1982.) Which I agree is not the summer prior to the Challenger launch; I was misremembering that part. But the rest of what I said--Thiokol recommending to NASA that the Shuttle be grounded until the O-ring issue was fixed, and NASA refusing--did happen the summer prior to the Challenger launch (summer 1985).

> the issues with the o-rings not sealing properly were known from the very beginning

The fact that the design was unorthodox was known, yes. The extent to which that design would lead to actual events in actual flights was not. The Thiokol engineers only gradually learned what the extent of the actual flight risk was as they analyzed flight data. A good account, which includes a brief description of how the design was flawed, the efforts made by the Thiokol engineers to analyze flight data and to obtain test data on the O-rings, the information sent by Thiokol to NASA in the summer of 1985, and and an account of the conference call the night before the Challenger launch, is given in this paper co-authored by Roger Boisjoly:

https://people.rit.edu/wlrgsh/FINRobison.pdf

To be clear, none of this means the design should have been accepted in the first place; clearly it shouldn't have been. I am simply pointing out that the article under discussion in this thread leaves out further points in the process, besides the original design choice and the conference call the night before the launch, where NASA was given strong indications that they should change their minds, and they never did.

KennyBlanken3 months ago

You're right, I corrected the date while I was drafting it and I only changed the date, not the range (note I said "4 years", challenger was in 1986).

I will update.

kjkjadksj3 months ago

Has anything changed with the structure of NASA today to prevent these same perverse incentives from emerging on future missions?

QuadmasterXLII3 months ago

Absolutely not – see the decision to put astronauts on the latest Starliner flight

toss13 months ago

Yes, what I found excellent about this explanation is how the system is actually dynamic, but NASA treated it as static.

In the reference good design, the single O-Rings must dynamically seat against the outermost surface from the forces in the initial leak test, and that seating is reinforced by the actual flight pressures. This also relies upon the free path of hot gasses from the hot pressurized combustion side to apply equal pressure around the ring and compress the now-hot air against the ring and steel walls, which act as a heat sink, and achieve 1/1-million failure rates.

In contrast, it looks like NASA treated it as a static system, just adding more "sealant" and O-Rings to the system, which actually forked-up the dynamics, forcing a single-point breakthru of the O-ring. And worse yet, they magnified the problem in the "fix", and the only reason it didn't happen again is they never launched in such cold temps again.

Also particularly sad to see the failure obvious in Pic #4, with hot gasses expelling from the SRB side even before it leaves the pad. They were already doomed. And even if someone somehow saw that failure happening on the pad, could anything have been done? A way to separate them from the main structure early and abort? Separate the shuttle from the main tank and abort?

Vecr3 months ago

There's no abort system at that point. They probably could have saved Columbia though.

jccooper3 months ago

Using a segmented solid was dumb from the start. Not to mention using a solid at all on a supposedly-reusable and/or human-carrying vehicle.

vundercind3 months ago

The SLS uses old Shuttle SRBs.

I assume the same ones with the post-Challenger 3-ring redesign that doesn’t fix the core problem at all.

Jesus. Add it to the list of safety-related reasons I hope that nonsense project never makes a crewed flight.

cryptonector3 months ago

After reading TFA I also have this question: are the SLS boosters also using this 3 o-ring design?

jccooper3 months ago

SLS uses improvements made late in the Shuttle program. Lower temp materials and some larger diameters. Presumably that exact problem is fixed. Others? Who knows.

phongn3 months ago

A combination of events conspired to cause the seal failure and blow-through, yes. But it wasn’t just Max-Q that defeated the slag, but that they experienced much stronger high altitude winds in flight than any flight before or after.

They rolled 20 at liftoff and then a 1 at altitude and that’s all she wrote.

NASA came close to getting away with it. They were due to introduce a redesigned field joint in late 1986 based off the lightweight solid boosters planned for the USAF missions out of SLC-6. Had they launched Challenger a day earlier or later they might have proceeded to the new joint, and no catastrophic seal failure.

csours3 months ago

I just finished "The Undoing Project" - about Kahneman and Tversky. It covers quite a lot of territory, but the title is about mentally 'Undoing' disasters.

Generally speaking, people pick a proximate human action or inaction as the keystone for preventing the problem.

In the book, they give the example of going back in time to kill Hitler - but people often don't decide to go back in time to buy Adolf's art - and then one of them suggests that even something as small as another sperm or no sperm 'winning' that particular race would disrupt history just as well.

It is much more satisfying to think about killing Hitler than it is to think about throwing a rock at his parent's window.

---

It is much more satisfying to think about NASA administrators taking the warnings seriously than it is to think about all the ways the culture and incentives were messed up. You can see a particular decision that was WRONG.

Finding fault with a person is a shortcut to mental satisfaction, but it will only at best fix one problem, and at worst will find the person who 'rolled the dice' wrong, or who picked the wrong lottery numbers. That is, you can find a person who was standing next to the cause of the problem, but any other person in that same spot would have the same odds of causing the same problem.

---

I've also been thinking about learning organizations - any org that wants to accomplish really big things has to be able to learn.

I'd love to hear of any personal experience of contracts that allow for learning. I think it's possible, but usually discouraged because contracts are written defensively, and learning involves a great deal of trust.

Its very clear in this case that NASA culture was deeply cynical and brittle. As a government organization they felt they could not show any failure or waste, and this must certainly have wormed into their group and personal psychology.

In contrast, SpaceX has demonstrated what a learning organization looks like - it looks like public failure. I emphasize IT LOOKS LIKE public failure. Learning means not being embarrassed about test rockets blowing up spectacularly. It means that you collect your data and improve, and try again.

To be sure, this would not work as well (or at all?) with a publicly traded company, and it certainly would not work with a government organization.

pixl973 months ago

This is a really good point. The shuttle was doomed to be a failure before the first part was produced. A whole bunch of different organizations wanted a piece of the pie and were forcing changes on the design that turned it into a pasted together swiss army knife. If you said no to any of those organizations, they would have fought against your funding.

Space is an arena that needs a lot of hardware rich testing to see what works and what fails and fast turn around times.

WalterBright3 months ago

I'm always amazed at how those big rocket engines, with all that heat, pressure, vibration, etc. do not just blow up but are actually light enough to fly. I see the bells on the Saturn V engines glowing yellow and orange and how in hell does that hang together!

I don't know how I could design such a thing, because my spidey sense says "it'll never fly!"

As for being an astronaut, nope. I quote Gimli, the first astronaut: "Certainty of death. Small chance of success. What are we waiting for?"

I remember at the time of the Challenger disaster, some of the other teacher candidates said "but they told us it was safe!" Come on, how could that giant flaming bomb ever be considered safe by a sane person.

My favorite technical book on rockets starts out saying "things that burn and explode" which in my mind is exactly what rockets are.

imemyself3 months ago

Not sure if this was posted because of the book - but a book on Challenger was released a month or two ago (https://www.amazon.com/Challenger-Story-Heroism-Disaster-Spa...).

I just finished reading and would strongly recommend it to anyone interested in Challenger or aerospace in general. One of my better reads in the last few years.

And also infuriating to read...my previous impression was that there was some concern about cold weather + the o-rings, and one guy thought they shouldn't launch.

But the management mistakes were far more grievous than I realized. There was a repeated pattern of near misses on the SRB's over the years before Challenger, and most engineers working on the SRB's felt very strongly that they should not launch. The previous coldest launch was 15+ degrees warmer than Challenger's, and came very very close to failure itself.

(And while it ended up not being what killed them, Rockwell, the folks who build the Shuttle itself, also did not want to launch, out of concerns about ice).

librasteve3 months ago

very good explanation. surely there must have been engineers on the team that knew the NASA design was unproven - by NOT whistleblowing they killed the crew as surely as anyone else - chickenshits

ceejayoz3 months ago

The engineers tried. https://en.wikipedia.org/wiki/Space_Shuttle_Challenger_disas...

> Based upon O-ring erosion that had occurred in warmer launches, Morton Thiokol engineers were concerned over the effect the record-cold temperatures would have on the seal provided by the SRB O-rings for the launch. Cecil Houston, the manager of the KSC office of the Marshall Space Flight Center, set up a conference call on the evening of January 27 to discuss the safety of the launch. Morton Thiokol engineers expressed their concerns about the effect of low temperatures on the resilience of the rubber O-rings. As the colder temperatures lowered the elasticity of the rubber O-rings, the engineers feared that the O-rings would not be extruded to form a seal at the time of launch. The engineers argued that they did not have enough data to determine whether the O-rings would seal at temperatures colder than 53 °F (12 °C), the coldest launch of the Space Shuttle to date. Morton Thiokol employees Robert Lund, the Vice President of Engineering, and Joe Kilminster, the Vice President of the Space Booster Programs, recommended against launching until the temperature was above 53 °F (12 °C).

This article confirms that:

> The decision to fly cold-soaked colder than the SRB’s had ever been tested, was also a NASA management decision. Both NASA and Thiokol engineers objected, but were over-ruled. Thiokol upper management also over-ruled their own engineers, and told NASA to go ahead and launch. Thus emboldened by Thiokol management, NASA launched the thing, thus killing its crew.

librasteve3 months ago

well, yes. Lets agree on the fact that Robert Lund, the Vice President of Engineering, and Joe Kilminster, the Vice President of the Space Booster Programs, recommended against launching ... I am saying that Robert Lund and Joe Kilminster are partly responsible for these horrible deaths and that instead of some paper recommendation they should have immediately given their resignation and made a press announcement of the situation (regardless of the consequences of breach of NDA or the ending of their careers) or otherwise prevented the launch, not least because as the senior executives with engineering oversight they would have been well aware of the organisation politics / buck passing culture

dogleash3 months ago

Whoever wrote that section on wikipedia didn't do a great job. They're mentioned by name when they were against the launch, and they're "Morton Thiokol leadership" when they change their mind. It's technically true but misleading, I was so confused reading it I had to reference the Roger's Commission report because I thought (correctly) that Kilminster was one that faxed in a signed recommendation to launch to overrule the engineer on site.

whycome3 months ago

There was also a good chance that the flight successfully would reach orbit. Then what?

+1
dogleash3 months ago
librasteve3 months ago

as set out in the article there is a 1/1000,000 O ring related failure rate for the normal design and an unknown (but now know to be ~1/50) failure rate for the NASA specified design - should corporate / military entities risk life at 1 million to one ... that seems fair to me with the understanding of the crew - but it is definitely not good engineering practice (and is likely a criminal offence) to take an unknown risk with a human crew