Wednesday, July 20, 2011

Nested case-control study

Nested case-control study can be described as follows: for a particular disease, all the patients that become diseased in a given cohort are labeled as "cases". Then corresponding to each "case", a pre-specified number (say, 4) of "controls" or healthy subjects (at the time when disease occurred for the case) are matched (irrespective of whether these healthy subjects became case at a later period). This design is interesting because cost can be minimized at the expense of negligible statistical inefficiencies compared to considering whole cohort. More can be found here.



In REpi package has clogistic function and survival package has clogit function (among others) that can perform conditional logistic regression (which is equivalent to Cox's proportional hazards model with conditioning at time of event occurrence that has baseline hazards instead of intercept in the model) required for analyzing nested case-control study data. Naturally, clogit is basically a wrapper for coxph. Some may prefer to use original coxph if they wants to use additional functionality of that original variable or other functions that depend on coxph directly. We can test their equivalence in the following ways:


> require(survival) # load the package if installed
> # "clogit" is a wrapper for a "coxph"
> data(infert) # matched case-control study

> head(infert)
  education age parity induced case spontaneous stratum pooled.stratum
1    0-5yrs  26      6       1    1           2       1              3
2    0-5yrs  42      1       1    1           0       2              1
3    0-5yrs  39      6       2    1           0       3              4
4    0-5yrs  34      4       2    1           0       4              2
5   6-11yrs  35      3       1    1           1       5             32
6   6-11yrs  36      4       2    1           1       6             36


> names(infert)
[1] "education"      "age"            "parity"         "induced"       
[5] "case"           "spontaneous"    "stratum"        "pooled.stratum"
> unique(infert$education)
[1] 0-5yrs  6-11yrs 12+ yrs
Levels: 0-5yrs 6-11yrs 12+ yrs
> unique(as.numeric(infert$education))
[1] 1 2 3

> # use of function clogit
> clogit(case ~ spontaneous + induced + strata(stratum), data=infert)
            coef exp(coef) se(coef)    z       p
spontaneous 1.99      7.29    0.352 5.63 1.8e-08
induced     1.41      4.09    0.361 3.91 9.4e-05


Likelihood ratio test=53.1  on 2 df, p=2.87e-12  n= 248, number of events= 83 

> # Recode time = 1 and status = 1 for case, 0 for control. (equivalent as clogit)
> coxph(Surv(rep(1, length(case)), case) ~ spontaneous + induced + strata(stratum), data=infert)
            coef exp(coef) se(coef)    z       p
spontaneous 1.99      7.29    0.352 5.63 1.8e-08
induced     1.41      4.09    0.361 3.91 9.4e-05


Likelihood ratio test=53.1  on 2 df, p=2.87e-12  n= 248, number of events= 83 

> # Keep time variable as it is (in numeric form), and status as before. (equivalent as clogit)
> coxph(Surv(as.numeric(education), case) ~ spontaneous + induced + strata(stratum), data=infert)
            coef exp(coef) se(coef)    z       p
spontaneous 1.99      7.29    0.352 5.63 1.8e-08
induced     1.41      4.09    0.361 3.91 9.4e-05


Likelihood ratio test=53.1  on 2 df, p=2.87e-12  n= 248, number of events= 83 


> # Put arbitrary discrete time variable and status as before. (equivalent as clogit) 
> coxph(Surv(1:length(case), case) ~ spontaneous + induced + strata(stratum), data=infert)

            coef exp(coef) se(coef)    z       p
spontaneous 1.99      7.29    0.352 5.63 1.8e-08
induced     1.41      4.09    0.361 3.91 9.4e-05


Likelihood ratio test=53.1  on 2 df, p=2.87e-12  n= 248, number of events= 83 
but try time = sample(c(1:1000), size = length(case), replace =T) and it won't provide same result!!

> # notice the difference when time values are not discrete. (NOT equivalent as clogit)
> coxph(Surv(rnorm(length(case)), case) ~ spontaneous + induced + strata(stratum), data=infert)
            coef exp(coef) se(coef)    z       p
spontaneous 1.91      6.72    0.464 4.11 0.00004
induced     1.73      5.63    0.513 3.37 0.00076


Likelihood ratio test=31.6  on 2 df, p=1.39e-07  n= 248, number of events= 83 


Acknowlegement: R-help

No comments:

Post a Comment