Biliary Cirrhosis
David P. Nichols, MD, MPH
August 22, 2018
This data is derived from the Mayo Clinic trial in primary biliary cirrhosis (PBC) of the liver conducted between 1974 and 1984. A description of the clinical background for the trial and the covariates recorded here can be found in Dickson, et al., Hepatology 10:1-7 (1989) and in Markus, et al., N Eng J of Med 320:1709-13 (1989). “A total of 424 PBC patients, referred to Mayo Clinic during that ten-year interval, met eligibility criteria for the randomized placebo controlled trial of the drug D-penicillamine. The first 312 cases in the data set participated in the randomized trial and contain largely complete data.” Missing data items are denoted by “.”
Primary biliary cirrhosis (now more accurately termed primary biliary cholangitis) is a chronic disease in which the bile ducts in the liver are slowly destroyed. When the bile ducts are damaged, bile can back up in the liver and sometimes lead to irreversible scarring of liver tissue (cirrhosis). Primary biliary cholangitis is considered an autoimmune disease, which means your body’s immune system is mistakenly attacking healthy cells and tissue. Researchers think a combination of genetic and environmental factors triggers the disease. It usually develops slowly. Medication can slow liver damage, especially if treatment begins early.
We will treat this data as a survival analysis problem. In addition to looking at the treatment’s efficacy, we can look at the effectiveness of the various physical signs and laboratory values included as covariates which may help to indicate prognosis for survival outcome.
First, load the necessary libraries:
set.seed(123)
library(tidyverse)
library(survival)
library(tableone)
library(Matching)
library(survminer)
library(survMisc)
library(survcomp)
library(broom)
library(tidyr)
library(flexsurv)
library(GGally)
library(purrr)
library(stringr)
library(gridExtra)
library(vcd)
I have stored the data on Google Sheets. It can be obtained by:
myurl <- "https://docs.google.com/spreadsheets/d/e/2PACX-1vTvuWL7vHEYFv9N0KV1C3ZVXDYhTMOS6n4PCjop8vaeK1IJQFsFTuh4bJYkdoYACZbc1upKeaykwqnK/pub?output=csv"
bcir <- read_csv(url(myurl))
colnames(bcir) <- c("id", "time", "event", "rx", "age",
"sex","ascites", "hepmeg", "spiders",
"edema", "bili", "chol", "alb", "cu",
"alkphos", "sgot", "trig", "plat", "ptt",
"stage")
glimpse(bcir)
## Observations: 311
## Variables: 20
## $ id <int> 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17...
## $ time <int> 4500, 1012, 1925, 1504, 2503, 1832, 2466, 2400, 51, 37...
## $ event <int> 0, 2, 2, 1, 2, 0, 2, 2, 2, 2, 2, 0, 2, 2, 0, 2, 2, 0, ...
## $ rx <int> 1, 1, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, ...
## $ age <int> 20617, 25594, 19994, 13918, 24201, 20284, 19379, 15526...
## $ sex <int> 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, ...
## $ ascites <int> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, ...
## $ hepmeg <int> 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, ...
## $ spiders <int> 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, ...
## $ edema <dbl> 0.0, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0,...
## $ bili <dbl> 1.1, 1.4, 1.8, 3.4, 0.8, 1.0, 0.3, 3.2, 12.6, 1.4, 3.6...
## $ chol <chr> "302", "176", "244", "279", "248", "322", "280", "562"...
## $ alb <dbl> 4.14, 3.48, 2.54, 3.53, 3.98, 4.09, 4.00, 3.08, 2.74, ...
## $ cu <chr> "54", "210", "64", "143", "50", "52", "52", "79", "140...
## $ alkphos <dbl> 7394.8, 516.0, 6121.8, 671.0, 944.0, 824.0, 4651.2, 22...
## $ sgot <dbl> 113.52, 96.10, 60.63, 113.15, 93.00, 60.45, 28.38, 144...
## $ trig <chr> "88", "55", "92", "72", "63", "213", "189", "88", "143...
## $ plat <chr> "221", "151", "183", "136", ".", "204", "373", "251", ...
## $ ptt <dbl> 10.6, 12.0, 10.3, 10.9, 11.0, 9.7, 11.0, 11.0, 11.5, 1...
## $ stage <int> 3, 4, 4, 3, 3, 3, 3, 2, 4, 4, 4, 3, 4, 3, 3, 4, 4, 3, ...
Change the treatment to 0’s and 1’s
bcir$rx <- bcir$rx -1
Include patients who had liver transplants in the censored group
#original study event was 0 = censored, 1 = transplant, 2 = death
#will count censored and transplant as censored
bcir <- bcir %>% mutate(event = if_else(event == 0 | event == 1, 0, 1))
bcir <- bcir %>% mutate(rx = as.integer(rx))
bcir <- bcir %>% mutate(age_yr = round(age/365))
Extract the numeric variables to work on them separately
#extract numerics
nums <- bcir %>% dplyr::select(bili, chol, alb, cu, alkphos, sgot,
trig, plat, ptt, age_yr)
Change the missing data “.” to NA so it can be imputed
#changes "." to NA by coercion
nums <- nums %>% map_at(c("chol", "cu", "trig", "plat"), as.numeric)
A data frame is more manageable than a list.
#change list to data frame
nums <- do.call(bind_cols, nums)
Inspect the distributions of the numeric variables
#visualize(numerics)
p1 <- ggplot(nums) +
geom_density(aes(x = bili))
p2 <- ggplot(nums) +
geom_density(aes(x = chol))
p3 <- ggplot(nums) +
geom_density(aes(x = alb))
p4 <- ggplot(nums) +
geom_density(aes(x = cu))
p5 <- ggplot(nums) +
geom_density(aes(x = alkphos))
p6 <- ggplot(nums) +
geom_density(aes(x = sgot))
p7 <- ggplot(nums) +
geom_density(aes(x = trig))
p8 <- ggplot(nums) +
geom_density(aes(x = plat))
p9 <- ggplot(nums) +
geom_density(aes(x = ptt))
p10 <- ggplot(nums) +
geom_density(aes(x = age_yr))
gridExtra::grid.arrange(p1, p2, p3, p4, p5,
p6, p7, p8, p9, p10,
nrow = 5)
Taking logs will improve the distributions (more normal appearing) for analysis
#bili chol alkphos cu have large right skews, get logs
nums <- nums %>% mutate(lbili = log(bili))
nums <- nums %>% mutate(lchol = log(chol))
nums <- nums %>% mutate(lalkphos = log(alkphos))
nums <- nums %>% mutate(lcu = log(cu))
nums <- nums %>% mutate(lsgot = log(sgot))
nums <- nums %>% mutate(lptt = log(ptt))
nums <- nums %>% mutate(ltrig = log(trig))
#discard nums which have been logged
nums1 <- nums %>% dplyr::select(lbili, lchol, lalkphos, lcu, lsgot,
lptt, ltrig, alb, plat, age_yr)
I will deal with missing data by random imputation, Please see my previous blog posts for an explanation of this method: Random Imputation - January 2018 and Random vs MICE imputation - July 2018. We will need to obtain the means and standard deviations of the variables that we will impute.
#Random impute the NA's
#first confirm that we have NA's
nums1 %>% summarize_all(mean, na.rm = FALSE)
## # A tibble: 1 x 10
## lbili lchol lalkphos lcu lsgot lptt ltrig alb plat age_yr
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.569 NA 7.27 NA 4.71 2.37 NA 3.52 NA 50.0
#then get the means of each variable
means <- nums1 %>% summarize_all(mean, na.rm = TRUE)
means
## # A tibble: 1 x 10
## lbili lchol lalkphos lcu lsgot lptt ltrig alb plat age_yr
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.569 5.80 7.27 4.26 4.71 2.37 4.72 3.52 262. 50.0
#then get the standard deviations of each variable
sds <- nums1 %>% summarize_all(sd, na.rm = TRUE)
sds
## # A tibble: 1 x 10
## lbili lchol lalkphos lcu lsgot lptt ltrig alb plat age_yr
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1.03 0.437 0.723 0.825 0.449 0.0881 0.447 0.417 95.7 10.6
Using random imputation, impute the variables with NA’s: lchol, lcu, ltrig, plat
#random impute the NA's
#lchol
nums1$lchol <- if_else(is.na(nums1$lchol),
rnorm(1, 5.80, 0.437),
nums1$lchol)
#lcu
nums1$lcu <- if_else(is.na(nums1$lcu),
rnorm(1, 4.26, 0.825),
nums1$lcu)
#ltrig
nums1$ltrig <- if_else(is.na(nums1$ltrig),
rnorm(1, 4.72, 0.447),
nums1$ltrig)
#plat
nums1$plat <- if_else(is.na(nums1$plat),
rnorm(1, 262, 95.6),
nums1$plat)
#now summarize again to check for any NA's left
nums1 %>% summarize_all(mean, na.rm = FALSE)
## # A tibble: 1 x 10
## lbili lchol lalkphos lcu lsgot lptt ltrig alb plat age_yr
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.569 5.78 7.27 4.26 4.71 2.37 4.79 3.52 262. 50.0
Prepare a pairs plot, looking for multicollinarity problems:
#pairs plot
GGally::ggpairs(nums1,
lower = list(continuous = wrap("points",
alpha = 0.2,
color = "blue",
size = .3)))
None of the continuous variables are highly correlated. The higher correlations occur among the tests usually positive in chronic liver disease(alkaline phosphatase, copper and bilirubin)
#plot variables
#connect the data frames back up
#first break off the non_numerics from bcir
bcir_non_numeric <- bcir %>% dplyr::select(id, time, event, rx, sex, ascites,
hepmeg, spiders, edema, stage)
#fix edema to get rid of decimal .5 value
bcir_non_numeric <- bcir_non_numeric %>%
mutate(edema1 = case_when(
edema == 0 ~ 0,
edema == 0.5 ~ 1,
edema == 1 ~ 2,
TRUE ~ 999))
bcir_non_numeric <- bcir_non_numeric %>%
dplyr::select(-edema) %>%
rename(edema = "edema1") %>%
mutate(edema = as.integer(bcir_non_numeric$edema))
#connect numeric and non-numeric
bcir1 <- bcir_non_numeric %>% bind_cols(nums1)
#check for NA's
sum(is.na(bcir1))
## [1] 0
Next, we will look for evidence of selection bias in this study. Might the treatment have been assigned based on the patient’s physical signs rather than on a randomized schedule?
#Could treament be assigned based on presence of symptoms?
#break off categoricals as a matrix
cats <- bcir1 %>% dplyr::select(rx, sex, ascites,
hepmeg, spiders, edema, stage)
cats1 <- cats %>% map(as.numeric)
#cats3 <- data.frame(cats3)
#cats3 <- as.tibble(cats3)
tab1 <- xtabs(~ cats$rx + cats$sex)
tab2 <- xtabs(~ cats$rx + cats$ascites)
tab3 <- xtabs(~ cats$rx + cats$hepmeg)
tab4 <- xtabs(~ cats$rx + cats$spiders)
tab5 <- xtabs(~ cats$rx + cats$edema)
tab6 <- xtabs(~ cats$rx + cats$stage)
p1 <- vcd::mosaic(tab1)
p2 <- vcd::mosaic(tab2)
p3 <- vcd::mosaic(tab3)
p4 <- vcd::mosaic(tab4)
p5 <- vcd::mosaic(tab5)
p6 <- vcd::mosaic(tab6)
The mosaic plots show that the treatment and physical signs are evenly matched up. The areas of the rectangles are about the same for all the variables when split into treated and non-treated groups.
#Could treament have been assigned based on presence of abnormal labs?
p1 <-ggplot(bcir1) +
geom_boxplot(aes(x = factor(rx), y = lbili), fill = "#6588a7") +
labs(x = "rx")
p2 <-ggplot(bcir1) +
geom_boxplot(aes(x = factor(rx), y = lchol), fill = "#6588a7") +
labs(x = "rx")
p3 <-ggplot(bcir1) +
geom_boxplot(aes(x = factor(rx), y = lalkphos), fill = "#6588a7") +
labs(x = "rx")
p4 <-ggplot(bcir1) +
geom_boxplot(aes(x = factor(rx), y = lcu), fill = "#6588a7") +
labs(x = "rx")
p5 <-ggplot(bcir1) +
geom_boxplot(aes(x = factor(rx), y = lsgot), fill = "#6588a7") +
labs(x = "rx")
p6 <-ggplot(bcir1) +
geom_boxplot(aes(x = factor(rx), y = lptt), fill = "#6588a7") +
labs(x = "rx")
p7 <-ggplot(bcir1) +
geom_boxplot(aes(x = factor(rx), y = ltrig), fill = "#6588a7") +
labs(x = "rx")
p8 <-ggplot(bcir1) +
geom_boxplot(aes(x = factor(rx), y = alb), fill = "#6588a7") +
labs(x = "rx")
p9 <-ggplot(bcir1) +
geom_boxplot(aes(x = factor(rx), y = plat), fill = "#6588a7") +
labs(x = "rx")
grid.arrange(p1, p2, p3, p4, p5, p6, p6, p7, p8, p9,
ncol = 4)
It looks as though the labs are evenly divided between treatment and non-treatment groups. So far there is no evidence of selection bias.
For a final check on selection bias we can see if there could be any improvement on the matching of treatment and non-treatment groups. The first thing is to create a vector of propensity scores by using a glm model using our data with treatment as the dependent variable.
psmodel <- glm(rx ~ sex + ascites + hepmeg + spiders + edema +
lbili + lchol + lalkphos + lcu + lsgot + lptt +
ltrig + alb + plat + age_yr + stage,
family = "binomial",
data = bcir1)
pscore <- psmodel$fitted.values #propensity scores
bcir1$pscore <- pscore
ps <- bcir1 %>% dplyr::select(rx, pscore)
table(ps$rx)
##
## 0 1
## 157 154
Next, we’ll make a comparison table showing the covariates in the two treatment groups. We want the standardized differences (SMD’s) to be less than 0.1 for the best match.
xvars <- c( "sex", "ascites", "hepmeg", "spiders", "edema",
"lbili", "lchol", "lalkphos", "lcu", "lsgot",
"lptt", "ltrig", "alb", "plat", "age_yr", "stage")
unmatchedtab1 <- tableone::CreateTableOne(vars = xvars,
strata = "rx",
data = bcir1,
test = FALSE)
print(unmatchedtab1, smd = TRUE)
## Stratified by rx
## 0 1 SMD
## n 157 154
## sex (mean (sd)) 0.87 (0.34) 0.90 (0.30) 0.114
## ascites (mean (sd)) 0.08 (0.28) 0.06 (0.25) 0.068
## hepmeg (mean (sd)) 0.46 (0.50) 0.56 (0.50) 0.213
## spiders (mean (sd)) 0.28 (0.45) 0.29 (0.46) 0.026
## edema (mean (sd)) 0.06 (0.23) 0.06 (0.25) 0.032
## lbili (mean (sd)) 0.52 (0.95) 0.61 (1.10) 0.088
## lchol (mean (sd)) 5.78 (0.39) 5.78 (0.45) 0.005
## lalkphos (mean (sd)) 7.28 (0.74) 7.27 (0.71) 0.018
## lcu (mean (sd)) 4.24 (0.82) 4.27 (0.83) 0.030
## lsgot (mean (sd)) 4.69 (0.45) 4.73 (0.45) 0.086
## lptt (mean (sd)) 2.36 (0.08) 2.37 (0.10) 0.143
## ltrig (mean (sd)) 4.79 (0.49) 4.78 (0.45) 0.009
## alb (mean (sd)) 3.52 (0.44) 3.52 (0.40) 0.004
## plat (mean (sd)) 259.32 (99.86) 265.25 (90.14) 0.062
## age_yr (mean (sd)) 51.43 (11.03) 48.61 (9.95) 0.268
## stage (mean (sd)) 2.97 (0.94) 3.09 (0.81) 0.140
For the most part, this is a pretty good match. Let’s see if we can improve the match by matching on the propensity scores.
psmatch <- Matching::Match(Tr = bcir1$rx,
M = 1,
X = pscore,
replace = FALSE)
matched <- bcir1[unlist(psmatch[c("index.treated",
"index.control")]),]
matchedtab1 <- tableone::CreateTableOne(vars = xvars,
strata = "rx",
data = matched,
test = FALSE)
print(matchedtab1, smd = TRUE)
## Stratified by rx
## 0 1 SMD
## n 154 154
## sex (mean (sd)) 0.88 (0.33) 0.90 (0.30) 0.083
## ascites (mean (sd)) 0.08 (0.27) 0.06 (0.25) 0.050
## hepmeg (mean (sd)) 0.46 (0.50) 0.56 (0.50) 0.208
## spiders (mean (sd)) 0.27 (0.45) 0.29 (0.46) 0.043
## edema (mean (sd)) 0.06 (0.24) 0.06 (0.25) 0.027
## lbili (mean (sd)) 0.51 (0.95) 0.61 (1.10) 0.102
## lchol (mean (sd)) 5.77 (0.40) 5.78 (0.45) 0.009
## lalkphos (mean (sd)) 7.26 (0.73) 7.27 (0.71) 0.002
## lcu (mean (sd)) 4.23 (0.81) 4.27 (0.83) 0.050
## lsgot (mean (sd)) 4.69 (0.45) 4.73 (0.45) 0.096
## lptt (mean (sd)) 2.36 (0.08) 2.37 (0.10) 0.149
## ltrig (mean (sd)) 4.78 (0.49) 4.78 (0.45) 0.002
## alb (mean (sd)) 3.52 (0.44) 3.52 (0.40) 0.002
## plat (mean (sd)) 260.87 (99.86) 265.25 (90.14) 0.046
## age_yr (mean (sd)) 51.08 (10.81) 48.61 (9.95) 0.238
## stage (mean (sd)) 2.97 (0.94) 3.09 (0.81) 0.141
We can compare the list of SMD’s to see if there has been any beneit from matching.
getsmd <- function(df) {
options(digits = 3)
print("SMD")
x <- tableone::ExtractSmd(df)
print(x)
paste("sum of SMD's = ", round(sum(x[1:length(x)]), 3))
}
getsmd(matchedtab1)
## [1] "SMD"
## 1 vs 2
## sex 0.08269
## ascites 0.05028
## hepmeg 0.20831
## spiders 0.04314
## edema 0.02690
## lbili 0.10216
## lchol 0.00888
## lalkphos 0.00206
## lcu 0.05045
## lsgot 0.09636
## lptt 0.14908
## ltrig 0.00191
## alb 0.00171
## plat 0.04601
## age_yr 0.23758
## stage 0.14062
## [1] "sum of SMD's = 1.248"
getsmd(unmatchedtab1)
## [1] "SMD"
## 1 vs 2
## sex 0.11353
## ascites 0.06813
## hepmeg 0.21326
## spiders 0.02636
## edema 0.03167
## lbili 0.08757
## lchol 0.00502
## lalkphos 0.01759
## lcu 0.03004
## lsgot 0.08580
## lptt 0.14295
## ltrig 0.00891
## alb 0.00414
## plat 0.06239
## age_yr 0.26821
## stage 0.14010
## [1] "sum of SMD's = 1.306"
There really isn’t much difference between the unmatched and matched groups so it looks as if the reseachers did the best possible job of matching through randomization of treatment.
Kaplan-Meier Curves
Efficacy of treatment
#KM Curve - rx
m1<- survfit(Surv(time, event) ~ rx, data = bcir1)
survdiff(Surv(time, event) ~ rx, data = bcir1)
## Call:
## survdiff(formula = Surv(time, event) ~ rx, data = bcir1)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## rx=0 157 64 62.7 0.0283 0.0572
## rx=1 154 60 61.3 0.0289 0.0572
##
## Chisq= 0.1 on 1 degrees of freedom, p= 0.8
ggsurvplot(m1, data = bcir1,
surv.median.line = "hv",
conf.int = TRUE) # Add medians survival
There is no signicicant difference between the treatment curves. There isn’t even a segment of the curves that shows any difference.
Sex
m2<- survfit(Surv(time, event) ~ sex, data = bcir1)
survdiff(Surv(time, event) ~ sex, data = bcir1)
## Call:
## survdiff(formula = Surv(time, event) ~ sex, data = bcir1)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## sex=0 36 22 14.5 3.868 4.43
## sex=1 275 102 109.5 0.513 4.43
##
## Chisq= 4.4 on 1 degrees of freedom, p= 0.04
ggsurvplot(m2, data = bcir1,
surv.median.line = "hv",
conf.int = TRUE)
There is no significant survival difference between men and women. The confidence intervals overlap along the whole length of the curves.
Stage of Disease
m3<- survfit(Surv(time, event) ~ stage, data = bcir1)
survdiff(Surv(time, event) ~ stage, data = bcir1)
## Call:
## survdiff(formula = Surv(time, event) ~ stage, data = bcir1)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## stage=1 16 1 9.84 7.95 8.73
## stage=2 67 16 32.08 8.06 10.96
## stage=3 120 43 50.79 1.20 2.03
## stage=4 108 64 31.28 34.23 46.49
##
## Chisq= 52.5 on 3 degrees of freedom, p= 2e-11
ggsurvplot(m3, data = bcir1,
surv.median.line = "hv",
conf.int = TRUE)
The first three stages show similar, overlapping survival curves. Stage 4 patients showed a significantly worse survival.
Ascites(fluid accumulation in peritoneal cavity)
m4<- survfit(Surv(time, event) ~ ascites, data = bcir1)
survdiff(Surv(time, event) ~ ascites, data = bcir1)
## Call:
## survdiff(formula = Surv(time, event) ~ ascites, data = bcir1)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## ascites=0 288 102 120.39 2.81 97.4
## ascites=1 23 22 3.61 93.68 97.4
##
## Chisq= 97.4 on 1 degrees of freedom, p= <2e-16
ggsurvplot(m4, data = bcir1,
surv.median.line = "hv",
conf.int = TRUE)
Those patients with ascites (fluid accumulation in the peritoneal cavity) showed a significantly worse survival. This indicates that the presence or absence of diabetes is a good indicator of survival prognosis.
Hepatomegaly(enlarged liver)
m5<- survfit(Surv(time, event) ~ hepmeg, data = bcir1)
survdiff(Surv(time, event) ~ hepmeg, data = bcir1)
## Call:
## survdiff(formula = Surv(time, event) ~ hepmeg, data = bcir1)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## hepmeg=0 152 37 71.2 16.5 39.4
## hepmeg=1 159 87 52.8 22.2 39.4
##
## Chisq= 39.4 on 1 degrees of freedom, p= 3e-10
ggsurvplot(m5, data = bcir1,
surv.median.line = "hv",
conf.int = TRUE)
Those patients with hepatomegaly (liver enlargement) sohow a significantly worse survival indicating that this physical sign is a good prognosticator of poor survival.
Spiders(skin vascular malformations)
m6<- survfit(Surv(time, event) ~ spiders, data = bcir1)
survdiff(Surv(time, event) ~ spiders, data = bcir1)
## Call:
## survdiff(formula = Surv(time, event) ~ spiders, data = bcir1)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## spiders=0 222 73 97.4 6.12 29
## spiders=1 89 51 26.6 22.44 29
##
## Chisq= 29 on 1 degrees of freedom, p= 7e-08
ggsurvplot(m6, data = bcir1,
surv.median.line = "hv",
conf.int = TRUE)
Likewise, spiders (skin vascular structures that have the appearance of spiders are a good predictor of poor survival.
Edema(fluid swelling in the legs)
m7<- survfit(Surv(time, event) ~ edema, data = bcir1)
survdiff(Surv(time, event) ~ edema, data = bcir1)
## Call:
## survdiff(formula = Surv(time, event) ~ edema, data = bcir1)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## edema=0 292 106 121.68 2.02 109
## edema=1 19 18 2.32 105.92 109
##
## Chisq= 109 on 1 degrees of freedom, p= <2e-16
ggsurvplot(m7, data = bcir1,
surv.median.line = "hv",
conf.int = TRUE)
Edema (fluid swelling in the legs) is also a reliable sign of poor survival, shown by the significant difference in survival curves between the groups having and not having spider skin vessels.
Evaluation of Laboratory Test Prognostic Value
The two functions below will categorize the lab tests at the 50th percentile and look for significant differences in survival between the two groups. These comparisons will indicate the prognostic value of the lab tests in primary biliary cirrhosis.
#Categorize the numerics for K-M estimator
#bili - not the log value
numcat_km <- function(var) {
var_cut <- cut(var,
quantile(var, c( 0.00, 0.50, 1.00),
na.rm = TRUE))
bcir1$var_cut <- var_cut
m8<- survfit(Surv(time, event) ~ var_cut, data = bcir1)
ggsurvplot(m8, data = bcir1,
surv.median.line = "hv",
conf.int = TRUE)
}
numcat_diff <- function(var) {
var_cut <- cut(var,
quantile(var, c( 0.00, 0.50, 1.00),
na.rm = TRUE))
bcir1$var_cut <- var_cut
survdiff(Surv(time, event) ~ var_cut, data = bcir1)
}
## [1] "Bilirubin - Good prognostic value"
## Call:
## survdiff(formula = Surv(time, event) ~ var_cut, data = bcir1)
##
## n=308, 3 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## var_cut=(0.3,1.3] 153 26 75.3 32.3 86.8
## var_cut=(1.3,28] 155 96 46.7 52.0 86.8
##
## Chisq= 86.8 on 1 degrees of freedom, p= <2e-16
## [1] "Cholesterol - Poor prognostic value"
## Call:
## survdiff(formula = Surv(time, event) ~ var_cut, data = bcir1)
##
## n=282, 29 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## var_cut=(120,310] 142 48 62.7 3.45 7.89
## var_cut=(310,1.78e+03] 140 65 50.3 4.29 7.89
##
## Chisq= 7.9 on 1 degrees of freedom, p= 0.005
## [1] "Albumin - Good prognostic value"
## Call:
## survdiff(formula = Surv(time, event) ~ var_cut, data = bcir1)
##
## n=310, 1 observation deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## var_cut=(1.96,3.55] 155 83 48.1 25.3 42.8
## var_cut=(3.55,4.64] 155 40 74.9 16.2 42.8
##
## Chisq= 42.8 on 1 degrees of freedom, p= 6e-11
## [1] "Copper - Good prognostic value"
## Call:
## survdiff(formula = Surv(time, event) ~ var_cut, data = bcir1)
##
## n=308, 3 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## var_cut=(4,73] 157 36 73.2 18.9 47.4
## var_cut=(73,588] 151 87 49.8 27.9 47.4
##
## Chisq= 47.4 on 1 degrees of freedom, p= 6e-12
## [1] "Alkaline Phosphatase - Marginal prognostic value"
## Call:
## survdiff(formula = Surv(time, event) ~ var_cut, data = bcir1)
##
## n=310, 1 observation deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## var_cut=(289,1.26e+03] 155 46 61.4 3.88 7.8
## var_cut=(1.26e+03,1.39e+04] 155 78 62.6 3.81 7.8
##
## Chisq= 7.8 on 1 degrees of freedom, p= 0.005
## [1] "SGOT - Good prognostic value"
## Call:
## survdiff(formula = Surv(time, event) ~ var_cut, data = bcir1)
##
## n=310, 1 observation deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## var_cut=(26.4,115] 156 42 71.9 12.4 30
## var_cut=(115,457] 154 82 52.1 17.2 30
##
## Chisq= 30 on 1 degrees of freedom, p= 4e-08
## [1] "Triglycerides - Marginal prognostic value"
## Call:
## survdiff(formula = Surv(time, event) ~ var_cut, data = bcir1)
##
## n=280, 31 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## var_cut=(33,108] 141 47 62.6 3.88 8.9
## var_cut=(108,598] 139 65 49.4 4.91 8.9
##
## Chisq= 8.9 on 1 degrees of freedom, p= 0.003
## [1] "PTT - Good prognostic value"
## Call:
## survdiff(formula = Surv(time, event) ~ var_cut, data = bcir1)
##
## n=310, 1 observation deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## var_cut=(9,10.6] 173 40 73.9 15.5 38.9
## var_cut=(10.6,17.1] 137 84 50.1 22.9 38.9
##
## Chisq= 38.9 on 1 degrees of freedom, p= 4e-10
Parametric test - Cox Proportional Hazards Model
#Cox Proportional Hzards model
m2 <- coxph(Surv(time, event) ~ rx + sex + ascites +
hepmeg + spiders + edema +
lbili + lchol + lalkphos + lcu + lsgot +
lptt + ltrig + alb + plat + age_yr + stage,
data = bcir1)
m2
## Call:
## coxph(formula = Surv(time, event) ~ rx + sex + ascites + hepmeg +
## spiders + edema + lbili + lchol + lalkphos + lcu + lsgot +
## lptt + ltrig + alb + plat + age_yr + stage, data = bcir1)
##
## coef exp(coef) se(coef) z p
## rx -2.80e-02 9.72e-01 1.98e-01 -0.14 0.88775
## sex -1.94e-02 9.81e-01 2.92e-01 -0.07 0.94697
## ascites 2.86e-01 1.33e+00 3.26e-01 0.88 0.38072
## hepmeg 1.31e-01 1.14e+00 2.41e-01 0.54 0.58851
## spiders -6.03e-02 9.41e-01 2.29e-01 -0.26 0.79203
## edema 8.82e-01 2.42e+00 3.41e-01 2.58 0.00976
## lbili 6.12e-01 1.84e+00 1.67e-01 3.68 0.00024
## lchol 1.23e-01 1.13e+00 2.64e-01 0.47 0.64070
## lalkphos -6.01e-02 9.42e-01 1.39e-01 -0.43 0.66568
## lcu 2.81e-01 1.33e+00 1.59e-01 1.77 0.07753
## lsgot 4.74e-01 1.61e+00 2.78e-01 1.70 0.08871
## lptt 3.31e+00 2.73e+01 1.22e+00 2.72 0.00652
## ltrig -2.67e-03 9.97e-01 2.31e-01 -0.01 0.99081
## alb -6.97e-01 4.98e-01 2.68e-01 -2.60 0.00931
## plat -7.49e-05 1.00e+00 1.15e-03 -0.06 0.94821
## age_yr 3.10e-02 1.03e+00 1.03e-02 2.99 0.00274
## stage 2.68e-01 1.31e+00 1.54e-01 1.74 0.08234
##
## Likelihood ratio test=212 on 17 df, p=<2e-16
## n= 311, number of events= 124
The Cox model confirms much of what we see in the non-parametric K-M models. There is no difference in survival between the treated and non-treated groups. Many of the tests and physical signs show significant associations.
Conclusion
This study shows no difference in survival between the two treatment groups. However, we can see that the physical signs and laboratory tests associated with liver disease contribute well to the prognostic decision making process. The Cox Rsquared is 0.498 which indicates that 50% of the outcome is not explained by the model. However, I think we have very good non-parametric models that inform the decision process well.
sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17134)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] bindrcpp_0.2.2 vcd_1.4-4 gridExtra_2.3
## [4] GGally_1.4.0 flexsurv_1.1 broom_0.5.0
## [7] survcomp_1.30.0 prodlim_2018.04.18 survMisc_0.5.5
## [10] survminer_0.4.3 ggpubr_0.1.7 magrittr_1.5
## [13] Matching_4.9-3 MASS_7.3-50 tableone_0.9.3
## [16] survival_2.42-6 forcats_0.3.0 stringr_1.3.1
## [19] dplyr_0.7.6 purrr_0.2.5 readr_1.1.1
## [22] tidyr_0.8.1 tibble_1.4.2 ggplot2_3.0.0
## [25] tidyverse_1.2.1
##
## loaded via a namespace (and not attached):
## [1] nlme_3.1-137 survivalROC_1.0.3 cmprsk_2.2-7
## [4] lubridate_1.7.4 RColorBrewer_1.1-2 httr_1.3.1
## [7] rprojroot_1.3-2 tools_3.5.1 backports_1.1.2
## [10] utf8_1.1.4 R6_2.2.2 KernSmooth_2.23-15
## [13] lazyeval_0.2.1 colorspace_1.3-2 rmeta_3.0
## [16] withr_2.1.2 tidyselect_0.2.4 compiler_3.5.1
## [19] cli_1.0.0 rvest_0.3.2 xml2_1.2.0
## [22] labeling_0.3 scales_1.0.0 lmtest_0.9-36
## [25] mvtnorm_1.0-8 quadprog_1.5-5 digest_0.6.16
## [28] rmarkdown_1.10 pkgconfig_2.0.2 htmltools_0.3.6
## [31] labelled_1.1.0 rlang_0.2.2 readxl_1.1.0
## [34] rstudioapi_0.7 SuppDists_1.1-9.4 bindr_0.1.1
## [37] zoo_1.8-3 jsonlite_1.5 Matrix_1.2-14
## [40] Rcpp_0.12.18 munsell_0.5.0 fansi_0.3.0
## [43] stringi_1.1.7 yaml_2.2.0 mstate_0.2.11
## [46] plyr_1.8.4 crayon_1.3.4 lattice_0.20-35
## [49] haven_1.1.2 splines_3.5.1 hms_0.4.2
## [52] knitr_1.20 pillar_1.3.0 reshape2_1.4.3
## [55] glue_1.3.0 evaluate_0.11 data.table_1.11.4
## [58] modelr_0.1.2 deSolve_1.21 cellranger_1.1.0
## [61] bootstrap_2017.2 gtable_0.2.0 muhaz_1.2.6
## [64] reshape_0.8.7 km.ci_0.5-2 assertthat_0.2.0
## [67] xtable_1.8-2 survey_3.33-2 e1071_1.7-0
## [70] class_7.3-14 KMsurv_0.1-5 lava_1.6.3