Chapter 4 Nested Case Control Study

4.1 simple tutor for nested case control (NCC) study

ref: https://biostat3.net/download/R/labs/q25.html https://users.ox.ac.uk/~scro1407/nested_cc_case_subcohort_7jul17.pdf

상기 사이트에서 피부암 연구에 대해서 NCC 연구 튜터를 올려놓았습니다. 여기는 이것을 정리한 수준으로 자세한 내용은 원사이트를 보시기 바랍니다. 우선 아래의 library를 설치하고, 불러옵니다.

if(!require(biostat3)) install.packages('biostat3'); library(biostat3)
if(!require(Epi)) install.packages('Epi');library(Epi)
if(!require(survival)) install.packages('survival');library(survival)
if(!require(tidyverse)) install.packages('tidyverse');library(tidyverse)

실습할 데이터를 불러옵니다. data(melanoma)로 데이터를 불러옵니다. 이후 데이터를 Localised이면서 cancer로 사망하며, 12개월 *10년인 사람으로 데이터를 만들겠습니다.

data(melanoma)
mel = melanoma %>%
  filter(stage =="Localised") %>%
  mutate(dc = status =='Dead: cancer' & surv_mm <120) %>%
  mutate(surv_10y = case_when(
    surv_mm >= 120 ~ 120, 
    TRUE ~ surv_mm
  ))

우선 Cox-Proportional model로 생존분석을 수행해 보겠습니다.

out_coh = mel %>% 
  coxph(data=.,
        Surv(surv_10y, dc) ~ sex + year8594 + agegrp)
summary(out_coh)
## Call:
## coxph(formula = Surv(surv_10y, dc) ~ sex + year8594 + agegrp, 
##     data = .)
## 
##   n= 5318, number of events= 960 
## 
##                             coef exp(coef) se(coef)      z Pr(>|z|)    
## sexFemale               -0.53061   0.58825  0.06545 -8.107 5.19e-16 ***
## year8594Diagnosed 85-94 -0.33339   0.71649  0.06618 -5.037 4.72e-07 ***
## agegrp45-59              0.28283   1.32688  0.09417  3.003  0.00267 ** 
## agegrp60-74              0.62006   1.85904  0.09088  6.823 8.90e-12 ***
## agegrp75+                1.21801   3.38045  0.10443 11.663  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                         exp(coef) exp(-coef) lower .95 upper .95
## sexFemale                  0.5882     1.7000    0.5174    0.6688
## year8594Diagnosed 85-94    0.7165     1.3957    0.6293    0.8157
## agegrp45-59                1.3269     0.7536    1.1032    1.5959
## agegrp60-74                1.8590     0.5379    1.5557    2.2215
## agegrp75+                  3.3804     0.2958    2.7547    4.1483
## 
## Concordance= 0.646  (se = 0.009 )
## Likelihood ratio test= 212.7  on 5 df,   p=<2e-16
## Wald test            = 217.9  on 5 df,   p=<2e-16
## Score (logrank) test = 226.8  on 5 df,   p=<2e-16

생존분석 자체에 대해서는 통계학 교과서를 꼭 공부하시기를 바랍니다.

코호트에 몇명이 있는가 확인해봅니다.

n_ind  = length(mel$id)
n_case = sum(mel$dc ==1)
cbind(n_ind, n_case)
##      n_ind n_case
## [1,]  5318    960

control 집단을 선택해 보겠습니다.

각 옵션의 의미는 https://rdrr.io/cran/Epi/man/ccwc.html 를 참고하시고 아래 표에 정리하였습니다.

옵션 의미
entry Time of entry to follow-up
exit Time of exit from follow-up
fail Status on exit (1=Fail, 0=Censored)
origin Origin of analysis time scale
controls The number of controls to be selected for each case
match List of categorical variables on which to match cases and controls
include List of other variables to be carried across into the case-control study
data Data frame in which to look for input variables
silent If FALSE, echos a . to the screen for each case-control set created; otherwise produces no output.
nccdata = ccwc(entry =0, 
               exit = surv_10y, 
               fail =dc, 
               origin =0, 
               controls=1, 
               match=list(sex),
               include=list(sex, year8594, agegrp, dc, id), 
               silent = FALSE,
               data=mel)
## 
## Sampling risk sets: ...................................................................................................................................................................................................................
## Warning in ccwc(entry = 0, exit = surv_10y, fail = dc, origin = 0, controls =
## 1, : there were tied failure times

conditional logistic regression

out_ncc <- clogit(Fail ~ sex + year8594 + agegrp + strata(Set), data=nccdata)
nccdata2 = nccdata%>% select(-Time) 
out_ncc2 = clogit(Fail ~ sex + year8594 + agegrp + strata(Set), data=nccdata2)
out_logi <- glm(Fail ~ sex + year8594 + agegrp + Set,
               family = binomial,
               data=nccdata)
out_coh_ncc = nccdata %>% 
  coxph(data=.,
        Surv(Time, Fail) ~ sex+ year8594 + agegrp +strata(Set))
summary(out_coh_ncc)
## Call:
## coxph(formula = Surv(Time, Fail) ~ sex + year8594 + agegrp + 
##     strata(Set), data = .)
## 
##   n= 1920, number of events= 960 
## 
##                             coef exp(coef) se(coef)      z Pr(>|z|)    
## sexFemale                     NA        NA  0.00000     NA       NA    
## year8594Diagnosed 85-94 -0.24737   0.78085  0.07069 -3.499 0.000467 ***
## agegrp45-59              0.29566   1.34402  0.09997  2.957 0.003102 ** 
## agegrp60-74              0.55120   1.73534  0.09715  5.674  1.4e-08 ***
## agegrp75+                0.95963   2.61073  0.11400  8.417  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                         exp(coef) exp(-coef) lower .95 upper .95
## sexFemale                      NA         NA        NA        NA
## year8594Diagnosed 85-94    0.7809     1.2807    0.6798    0.8969
## agegrp45-59                1.3440     0.7440    1.1049    1.6349
## agegrp60-74                1.7353     0.5763    1.4345    2.0993
## agegrp75+                  2.6107     0.3830    2.0880    3.2644
## 
## Concordance= 0.63  (se = 0.015 )
## Likelihood ratio test= 84.63  on 4 df,   p=<2e-16
## Wald test            = 84.75  on 4 df,   p=<2e-16
## Score (logrank) test = 87.14  on 4 df,   p=<2e-16

4.2 Sampling schemes for nested case-control studies

Disregarding time - cumulative case-control sampling > most appropriate for short duration of observation