Chapter 4 Nested Case Control Study
4.1 simple tutor for nested case control (NCC) study
ref: https://biostat3.net/download/R/labs/q25.html https://users.ox.ac.uk/~scro1407/nested_cc_case_subcohort_7jul17.pdf
상기 사이트에서 피부암 연구에 대해서 NCC 연구 튜터를 올려놓았습니다. 여기는 이것을 정리한 수준으로 자세한 내용은 원사이트를 보시기 바랍니다. 우선 아래의 library를 설치하고, 불러옵니다.
if(!require(biostat3)) install.packages('biostat3'); library(biostat3)
if(!require(Epi)) install.packages('Epi');library(Epi)
if(!require(survival)) install.packages('survival');library(survival)
if(!require(tidyverse)) install.packages('tidyverse');library(tidyverse)
실습할 데이터를 불러옵니다. data(melanoma)
로 데이터를 불러옵니다. 이후 데이터를 Localised
이면서 cancer
로 사망하며, 12개월 *10년인 사람으로 데이터를 만들겠습니다.
data(melanoma)
= melanoma %>%
mel filter(stage =="Localised") %>%
mutate(dc = status =='Dead: cancer' & surv_mm <120) %>%
mutate(surv_10y = case_when(
>= 120 ~ 120,
surv_mm TRUE ~ surv_mm
))
우선 Cox-Proportional model로 생존분석을 수행해 보겠습니다.
= mel %>%
out_coh coxph(data=.,
Surv(surv_10y, dc) ~ sex + year8594 + agegrp)
summary(out_coh)
## Call:
## coxph(formula = Surv(surv_10y, dc) ~ sex + year8594 + agegrp,
## data = .)
##
## n= 5318, number of events= 960
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sexFemale -0.53061 0.58825 0.06545 -8.107 5.19e-16 ***
## year8594Diagnosed 85-94 -0.33339 0.71649 0.06618 -5.037 4.72e-07 ***
## agegrp45-59 0.28283 1.32688 0.09417 3.003 0.00267 **
## agegrp60-74 0.62006 1.85904 0.09088 6.823 8.90e-12 ***
## agegrp75+ 1.21801 3.38045 0.10443 11.663 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sexFemale 0.5882 1.7000 0.5174 0.6688
## year8594Diagnosed 85-94 0.7165 1.3957 0.6293 0.8157
## agegrp45-59 1.3269 0.7536 1.1032 1.5959
## agegrp60-74 1.8590 0.5379 1.5557 2.2215
## agegrp75+ 3.3804 0.2958 2.7547 4.1483
##
## Concordance= 0.646 (se = 0.009 )
## Likelihood ratio test= 212.7 on 5 df, p=<2e-16
## Wald test = 217.9 on 5 df, p=<2e-16
## Score (logrank) test = 226.8 on 5 df, p=<2e-16
생존분석 자체에 대해서는 통계학 교과서를 꼭 공부하시기를 바랍니다.
코호트에 몇명이 있는가 확인해봅니다.
= length(mel$id)
n_ind = sum(mel$dc ==1)
n_case cbind(n_ind, n_case)
## n_ind n_case
## [1,] 5318 960
control 집단을 선택해 보겠습니다.
각 옵션의 의미는 https://rdrr.io/cran/Epi/man/ccwc.html 를 참고하시고 아래 표에 정리하였습니다.
옵션 | 의미 |
---|---|
entry | Time of entry to follow-up |
exit | Time of exit from follow-up |
fail | Status on exit (1=Fail, 0=Censored) |
origin | Origin of analysis time scale |
controls | The number of controls to be selected for each case |
match | List of categorical variables on which to match cases and controls |
include | List of other variables to be carried across into the case-control study |
data | Data frame in which to look for input variables |
silent | If FALSE, echos a . to the screen for each case-control set created; otherwise produces no output. |
= ccwc(entry =0,
nccdata exit = surv_10y,
fail =dc,
origin =0,
controls=1,
match=list(sex),
include=list(sex, year8594, agegrp, dc, id),
silent = FALSE,
data=mel)
##
## Sampling risk sets: ...................................................................................................................................................................................................................
## Warning in ccwc(entry = 0, exit = surv_10y, fail = dc, origin = 0, controls =
## 1, : there were tied failure times
conditional logistic regression
<- clogit(Fail ~ sex + year8594 + agegrp + strata(Set), data=nccdata)
out_ncc = nccdata%>% select(-Time)
nccdata2 = clogit(Fail ~ sex + year8594 + agegrp + strata(Set), data=nccdata2)
out_ncc2 <- glm(Fail ~ sex + year8594 + agegrp + Set,
out_logi family = binomial,
data=nccdata)
= nccdata %>%
out_coh_ncc coxph(data=.,
Surv(Time, Fail) ~ sex+ year8594 + agegrp +strata(Set))
summary(out_coh_ncc)
## Call:
## coxph(formula = Surv(Time, Fail) ~ sex + year8594 + agegrp +
## strata(Set), data = .)
##
## n= 1920, number of events= 960
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sexFemale NA NA 0.00000 NA NA
## year8594Diagnosed 85-94 -0.24737 0.78085 0.07069 -3.499 0.000467 ***
## agegrp45-59 0.29566 1.34402 0.09997 2.957 0.003102 **
## agegrp60-74 0.55120 1.73534 0.09715 5.674 1.4e-08 ***
## agegrp75+ 0.95963 2.61073 0.11400 8.417 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sexFemale NA NA NA NA
## year8594Diagnosed 85-94 0.7809 1.2807 0.6798 0.8969
## agegrp45-59 1.3440 0.7440 1.1049 1.6349
## agegrp60-74 1.7353 0.5763 1.4345 2.0993
## agegrp75+ 2.6107 0.3830 2.0880 3.2644
##
## Concordance= 0.63 (se = 0.015 )
## Likelihood ratio test= 84.63 on 4 df, p=<2e-16
## Wald test = 84.75 on 4 df, p=<2e-16
## Score (logrank) test = 87.14 on 4 df, p=<2e-16
4.2 Sampling schemes for nested case-control studies
Disregarding time - cumulative case-control sampling > most appropriate for short duration of observation