Epi Data Science: A Bayesian Look at the Challenger Disaster

A Bayesian Look at the Challenger Disaster

The NASA space shuttle Challenger exploded on January 28, 1986, just 73 seconds after liftoff, bringing a devastating end to the spacecraft’s 10th mission. The disaster claimed the lives of all seven astronauts aboard, including Christa McAuliffe, a teacher from New Hampshire who would have been the first civilian in space. It was later determined that two rubber O-rings, which had been designed to separate the sections of the rocket booster, had failed due to cold temperatures on the morning of the launch. The tragedy and its aftermath received extensive media coverage and prompted NASA to temporarily suspend all shuttle missions.

Suppose that we could look back in time to examine the Challenger O-rings. We can approximate this time journey by looking at the data availabe to the Shuttle engineers at the time. There were twenty-three previous Space Shuttle flights using four separate vehicles (Challenger, Atlantis, Columbia and Discovery). Several flights were determined to have defective O-rings on post flight inspection. Using that data, I will attempt a look back at the Challenger disaster to see what effect the low temperature at the time of launch (31 degrees F) may have had on the O-rings, leading to failure.

First, load the required packages

#devtools::install_github("rmcelreath/rethinking", force = TRUE)
library(rethinking)
library(ggplot2)
set.seed(123)

The data for this analysis can be downloaded from the UCI database UCI. To make it easier, I have recompiled the data into a data frame(orings) that you can just copy.

o_rings <- rep(6, 23)
distressed <- c(0, 1, 0, 0, 0, 0, 0, 0,
                1, 1, 1, 0, 0, 2, 0, 0,
                0, 0, 0, 0, 2, 0, 1)
l_temp <- c(66, 70, 69, 68, 67, 72, 73, 70,
            57, 63, 70, 78, 67, 53, 67, 75,
            70, 81, 76, 79, 75, 76, 58)
psi <- c(50, 50, 50, 50, 50, 50, 100, 100,
         200, 200, 200, 200, 200, 200, 200, 200,
         200, 200, 200, 200, 200, 200, 200)
flight <- seq(1,23,1)
weight <- c(235415, 241664, 247112, 256744, 249178, 242742,
             247619, 250452, 254254, 263477, 242780, 263324,
             250891, 250891, 246880, 256524, 252855, 262309,
             190400, 243762, 261455, 256003, 121778)
vehicle <- c("Columbia",   "Columbia",   "Columbia",  
              "Challenger", "Challenger", "Challenger",
              "Columbia",   "Challenger", "Challenger",
              "Discovery",  "Challenger", "Discovery", 
              "Discovery",  "Discovery",  "Challenger",
              "Discovery",  "Challenger", "Discovery", 
              "Atlantis",   "Challenger", "Atlantis",  
              "Columbia",   "Challenger")            
orings <- data.frame(o_rings, distressed, l_temp,
                      psi, flight, weight, vehicle)

You can see the distribution of normal and distressed O-rings found on the four Shuttle vehicles.

ggplot(orings, aes(x = l_temp, y = distressed, color = vehicle)) + geom_point() +xlab("Launch Air Temperature") +
  ylab("Distressed O-Rings") + ggtitle("Distressed O-Rings: Pre-Challenger(STS-51-L) Flights")

Only 18 points are visible due to overlap. The normal and distressed O-rings appear to be fairly evenly distributed among the four vehicles.

Below, in the boxplots, lower temperatures are associated with more distressed O-rings. The lowest launch temperature, until the Challenger disaster (31 degrees F), was 53 degrees F.

ggplot(orings, aes(x = as.factor(distressed), y = l_temp)) +
  geom_boxplot() +
  ggtitle(label = "Challenger Disaster",
          subtitle = "DIstressed O-Rings vs. Launch Air Temperature") +
  ylab("Launch Air Temperature") +
  xlab("Distressed O-Rings (per mission)")

In order to model the relationship between launch temperature and the count of distressed O-rings, I used a Bayesian approach. We start out first with a simple model and work up in complexity. The various models are compared. The model having the posterior distribution with the highest entropy (measured by WAIC - widely applicable information criterion) will be selected for predicting O-ring damage at lower temperatures.

I selected a Poisson distribution with a log link function for use in the model. This is because we are modeling counts of distressed O-rings per flight. Use of a normal distribution would imply that there is some “population” of Shuttle O-rings from which we are predicting out-of-sample outcomes. In my opinion, a Poisson model is the most reasonable choice with these data.

I have used weakly informative priors primarily to act as a brake on overfitting. Quadratic approximation will be used for most of the models and MCMC(using Stan) for one other model.

#The simplest model. Coefficients are zero. Only the intercept is included in the model
m1 <- map(
  alist(
    distressed ~ dpois(lambda),
    log(lambda) <- a,
    a ~ dnorm(1, 100)
  ),
  data = orings) 
precis(m1)

##    Mean StdDev  5.5% 94.5%
## a -0.94   0.33 -1.47 -0.41

Now to increase the complexity of the model by adding launch temperature (bt).

#adding a coefficient for launch temperature
m2 <- map(
  alist(
    distressed ~ dpois(lambda),
    log(lambda) <- a + bt*l_temp,
    a ~ dnorm(0, 1),
    bt ~ dnorm(0,1)
  ),
  data = orings) 
precis(m2, corr = TRUE)

##     Mean StdDev  5.5% 94.5%     a    bt
## a   0.64   0.96 -0.88  2.17  1.00 -0.94
## bt -0.02   0.01 -0.05  0.00 -0.94  1.00

plot(precis(m2))

The plot for model m2 shows that the coefficient for launch temperature is -0.02 and the right tail (at 94.5%) does not go past zero.

Model m3 adds launch weight (bw) as a predictor. I have chosen to scale the weight since it is several orders of magnitude greater than the values for temperature. This will make the computation less complex and will guard against overfitting.

#adding a coefficient for weight
orings$weight_sc <- scale(orings$weight)
m3 <- map(
  alist(
    distressed ~ dpois(lambda),
    log(lambda) <- a + bt*l_temp + bw*weight_sc,
    a ~ dnorm(0, 1),
    bt ~ dnorm(0, 1),
    bw ~ dnorm(0, 1)
  ),
  data = orings) 
precis(m3, corr = TRUE)

##     Mean StdDev  5.5% 94.5%     a    bt    bw
## a   0.59   0.96 -0.94  2.13  1.00 -0.94  0.09
## bt -0.02   0.01 -0.05  0.00 -0.94  1.00 -0.06
## bw -0.07   0.28 -0.52  0.37  0.09 -0.06  1.00

plot(precis(m3))

Model m3 shows that launch tmperature retains its position relative to zero. Launch weight appears to have a negative effect on O-ring damage (more weight, less damage; less weight, more damage), but it has a high degree of uncertainty compared to temperature.

Model m4 adds scaled leak-check pressure as a variable.

#adding scaled leak-check pressure
orings$psi_sc <- scale(orings$psi)
m4 <- map(
  alist(
    distressed ~ dpois(lambda),
    log(lambda) <- a + bt*l_temp + bw*weight_sc + bp*psi_sc,
    a ~ dnorm(0, 1),
    bt ~ dnorm(0, 1),
    bw ~ dnorm(0, 1),
    bp ~ dnorm(0, 1)
  ),
  data = orings) 
precis(m4, corr = TRUE)

##     Mean StdDev  5.5% 94.5%     a    bt    bw    bp
## a   0.56   0.95 -0.95  2.08  1.00 -0.92  0.09 -0.03
## bt -0.02   0.01 -0.05  0.00 -0.92  1.00 -0.08 -0.15
## bw -0.03   0.26 -0.45  0.39  0.09 -0.08  1.00  0.09
## bp  0.57   0.44 -0.13  1.27 -0.03 -0.15  0.09  1.00

plot(precis(m4))

We can use the Shuttle vehicles as a source of group level variation that we can exploit in a multilevel model. Here we are using Stan to produce a Hamiltonian MCMC model for the four types of Shuttle vehicles.

#Multilevel model using integers for vehicles
#Data preparation for Stan
orings$vehic <- ifelse(orings$vehicle == "Challenger", 1,
                       ifelse(orings$vehicle == "Atlantis", 2,
                              ifelse(orings$vehicle == "Columbia", 3, 4)))
orings$vehic <- coerce_index(orings$vehic)
orings_stan <- orings[, c(2,3,10)]
orings_stan$distressed <- as.integer(orings_stan$distressed)
orings_stan$l_temp <- as.integer(orings_stan$l_temp)

m5 <- map2stan(
  alist(
    distressed ~ dbinom(l_temp, p),
    logit(p) <- a_vehic[vehic],
    a_vehic[vehic] ~ dnorm(0,5)
  ),
  data = orings_stan)

precis(m5, depth = 2)

##             Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
## a_vehic[1] -5.51   0.60      -6.35      -4.50   970    1
## a_vehic[2] -4.41   0.74      -5.63      -3.31  1000    1
## a_vehic[3] -6.05   1.00      -7.44      -4.58   750    1
## a_vehic[4] -5.05   0.62      -5.99      -4.10   835    1

plot(precis(m5, depth = 2))

There does not appear to be much variation between the Suttle vehicle groups. This would be expected since the basic design and launch physics of each vehicle would be similar.

A final area to explore is whether there is any interaction between launch temperature and weight to include in our prediction.

m6 <- map(
  alist(
    distressed ~ dpois( lambda ),
    log(lambda) <- Intercept +
      bt*l_temp +
      bw*weight_sc +
      btw*l_temp*weight_sc,
    Intercept ~ dnorm(0,1),
    bt ~ dnorm(0,1),
    bw ~ dnorm(0,1),
    btw ~ dnorm(0,1)
  ),
  data = orings)

precis(m6, corr = TRUE)

##            Mean StdDev  5.5% 94.5% Intercept    bt    bw   btw
## Intercept  0.63   0.96 -0.91  2.16      1.00 -0.94  0.00  0.03
## bt        -0.02   0.01 -0.05  0.00     -0.94  1.00  0.03 -0.05
## bw        -0.16   0.95 -1.67  1.36      0.00  0.03  1.00 -0.96
## btw        0.00   0.02 -0.02  0.03      0.03 -0.05 -0.96  1.00

plot(precis(m6))

There may be a small degree of interaction between temperature and weight, but temperature is the dominant factor here.

Now to compare the various models:

compare(m1, m2, m3, m4, m5, m6)

##     WAIC pWAIC dWAIC weight    SE  dSE
## m2  37.6   1.1   0.0   0.61  7.84   NA
## m1  39.8   1.1   2.2   0.20  7.70 1.42
## m4  41.2   3.5   3.6   0.10  8.33 3.88
## m6  42.7   3.3   5.1   0.05  9.25 3.72
## m3  43.4   3.7   5.8   0.03  9.92 5.00
## m5 117.8   4.3  80.2   0.00 32.73 8.68

Model 2 (m2) has the lowest WAIC, and is given 61% of the weight. It is also the simplest model of the group. It can now be used to predict a probability distribution for O-ring damage at lower temperatures.

The first step is to link model m2 to a lower temperature sequence. Samples from the posterior distribution are then used to calculate the percentile interval (PI). The link function computes the value of each linear model at each sample for each case in the data. Inverse link functions are applied, so that for example a logit link linear model produces probabilities, using the logistic transform.

l_temp_seq <- seq(30, 81, by = 1)
log_lambda <- link(m2, data = data.frame(l_temp = l_temp_seq))

## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]

log_lambda.mean <- apply(log_lambda, 2, mean)
log_lambda.pi <- apply(log_lambda, 2, PI, prob = 0.99)

plot(orings$l_temp, orings$distressed,
     xlim = c(30, 85),
     ylim = c(0, 6),
     main = "Distressed Shuttle O-rings vs. Temperature",
     ylab = "Number of Distressed O-rings",
     xlab = "Temperature (degrees F)",
     sub = "Shuttle flights 1 - 23")
lines(l_temp_seq, log_lambda.mean)
shade(log_lambda.pi, l_temp_seq)

The number of distressed O-rings stays near zero at temperatures above 60 drgrees F. At 31 degrees F, up to 4 distressed O-rings are within the probability interval. The mean line (the densest part of the probability interval), shows 1 distressed O-ring at 31 degrees F).

Epi Data Science

Sunday, December 3, 2017

A Bayesian Look at the Challenger Disaster

A Bayesian Look at the Challenger Disaster

David P. Nichols, MD, MPH

November 25, 2017

First, load the required packages

The data for this analysis can be downloaded from the UCI database UCI. To make it easier, I have recompiled the data into a data frame(orings) that you can just copy.

You can see the distribution of normal and distressed O-rings found on the four Shuttle vehicles.

Only 18 points are visible due to overlap. The normal and distressed O-rings appear to be fairly evenly distributed among the four vehicles.

Below, in the boxplots, lower temperatures are associated with more distressed O-rings. The lowest launch temperature, until the Challenger disaster (31 degrees F), was 53 degrees F.

I have used weakly informative priors primarily to act as a brake on overfitting. Quadratic approximation will be used for most of the models and MCMC(using Stan) for one other model.

Now to increase the complexity of the model by adding launch temperature (bt).

The plot for model m2 shows that the coefficient for launch temperature is -0.02 and the right tail (at 94.5%) does not go past zero.

Model m3 adds launch weight (bw) as a predictor. I have chosen to scale the weight since it is several orders of magnitude greater than the values for temperature. This will make the computation less complex and will guard against overfitting.

Model m3 shows that launch tmperature retains its position relative to zero. Launch weight appears to have a negative effect on O-ring damage (more weight, less damage; less weight, more damage), but it has a high degree of uncertainty compared to temperature.

Model m4 adds scaled leak-check pressure as a variable.

We can use the Shuttle vehicles as a source of group level variation that we can exploit in a multilevel model. Here we are using Stan to produce a Hamiltonian MCMC model for the four types of Shuttle vehicles.

There does not appear to be much variation between the Suttle vehicle groups. This would be expected since the basic design and launch physics of each vehicle would be similar.

A final area to explore is whether there is any interaction between launch temperature and weight to include in our prediction.

There may be a small degree of interaction between temperature and weight, but temperature is the dominant factor here.

Now to compare the various models:

Model 2 (m2) has the lowest WAIC, and is given 61% of the weight. It is also the simplest model of the group. It can now be used to predict a probability distribution for O-ring damage at lower temperatures.

The number of distressed O-rings stays near zero at temperatures above 60 drgrees F. At 31 degrees F, up to 4 distressed O-rings are within the probability interval. The mean line (the densest part of the probability interval), shows 1 distressed O-ring at 31 degrees F).

No comments:

Post a Comment

Pages

Labels

My Blog List

Sunday, December 3, 2017

A Bayesian Look at the Challenger Disaster

A Bayesian Look at the Challenger Disaster

David P. Nichols, MD, MPH

November 25, 2017

First, load the required packages

The data for this analysis can be downloaded from the UCI database UCI. To make it easier, I have recompiled the data into a data frame(orings) that you can just copy.

You can see the distribution of normal and distressed O-rings found on the four Shuttle vehicles.

Only 18 points are visible due to overlap. The normal and distressed O-rings appear to be fairly evenly distributed among the four vehicles.

Below, in the boxplots, lower temperatures are associated with more distressed O-rings. The lowest launch temperature, until the Challenger disaster (31 degrees F), was 53 degrees F.

I have used weakly informative priors primarily to act as a brake on overfitting. Quadratic approximation will be used for most of the models and MCMC(using Stan) for one other model.

Now to increase the complexity of the model by adding launch temperature (bt).

The plot for model m2 shows that the coefficient for launch temperature is -0.02 and the right tail (at 94.5%) does not go past zero.

Model m3 adds launch weight (bw) as a predictor. I have chosen to scale the weight since it is several orders of magnitude greater than the values for temperature. This will make the computation less complex and will guard against overfitting.

Model m3 shows that launch tmperature retains its position relative to zero. Launch weight appears to have a negative effect on O-ring damage (more weight, less damage; less weight, more damage), but it has a high degree of uncertainty compared to temperature.

Model m4 adds scaled leak-check pressure as a variable.

Pressure (bp) appears to strongly related to O-ring damage, but it also has a high level of uncertainty.

We can use the Shuttle vehicles as a source of group level variation that we can exploit in a multilevel model. Here we are using Stan to produce a Hamiltonian MCMC model for the four types of Shuttle vehicles.

There does not appear to be much variation between the Suttle vehicle groups. This would be expected since the basic design and launch physics of each vehicle would be similar.

A final area to explore is whether there is any interaction between launch temperature and weight to include in our prediction.

There may be a small degree of interaction between temperature and weight, but temperature is the dominant factor here.

Now to compare the various models:

Model 2 (m2) has the lowest WAIC, and is given 61% of the weight. It is also the simplest model of the group. It can now be used to predict a probability distribution for O-ring damage at lower temperatures.

The number of distressed O-rings stays near zero at temperatures above 60 drgrees F. At 31 degrees F, up to 4 distressed O-rings are within the probability interval. The mean line (the densest part of the probability interval), shows 1 distressed O-ring at 31 degrees F).

No comments:

Post a Comment

My Blog List