Dear Experts,

 

I am trying to do a time-stratified case-crossover analysis on air
pollution data and number of myocardial infarctions. In order to avoid
model selection bias, I started with a simple simulation. 

 

I'm still not sure if my simulation is right. But the results I get from
the "ts-case-crossover" are much more variable than those from a glm.

 

Is this:

a.      Due to the simple relation of log-rate of mi cases being = alpha
+ beta*pm that the glm results are more precise?
b.      Due to me using method="approximate" instead of default "exact"
in the clogit?
c.      Due to the fact that the "approximate" method in clogit use
"breslow" for handling ties and not "efron"?
d.      Due to that I've misunderstood how to arrange data when doing a
case-crossover, such that there is a way of using "exact", that wouldn't
go berserk due to the many ties (see #Berserk script at bottom).
e.      The prize to pay for using a case-crossover analysis?

 

It should perhaps be noted that, because of the absence of individual
data, the exposure to air-pollution (pm10) is assumed to be common to
all individuals on a certain day. 

 

I'd be most grateful for any help and ideas on this matter.

 

Best regards,

 

Fredrik Nilsson, PhD

 

PS. I am aware of the limitations that Whitaker et al. presented in
Environmetrics 2007; 18: 157-171, but tried to use the time-stratified
case-crossover as it could be a "simple", but fairly correct way of
doing this type of analysis.

 

Simulation script:

library(survival)

n<-2*365

samp<-100

ti<-1:n

ar1<-rnorm(1)

for (i in 2:n){

  ar1[i]<-0.2*ar1[i-1]+rnorm(1)

}

 

#old version of air pollution

#pm10<-2 + .5*sin(2*pi*ti/365 + 127) + 0.1*ar1

 

startdate<-"1992-07-01"

date1<-as.Date(startdate)

dates<-date1 + (ti-1)

tyda<-weekdays(dates)

tymo<-months(dates)

tyyear<-as.character(dates)

for (i in 1:n)

{

  ask<-tyyear[i]

  tyyear[i]<-substr(ask,1,4)

}

 

 

moeff<-cumsum(c(1,31,28,31,30,31,30,31,31,30,31,30))

a<-as.Date("1994-12-31")

monthnames<-months(a+moeff)

tymofa<-match(tymo,monthnames)

 

moeff<-.5*sin(2*pi*moeff/365 + 127)

 

#new version; to have strictly stationary levels of air-pollution 

#within strata

pm10<-2 + .5*moeff[tymofa] + 0.1*ar1

 

#intercept and proportionality to pm10 coefficients of log-rate

al<- -2.5

be<- 1.4

 

 

qres<-numeric(samp)

glmres<-qres

gamres<-qres

for(q in 1:samp)

{

 

rate<-exp(al + be*pm10)

mi<-rpois(length(rate),lambda=rate)

 

 

date1<-as.Date(startdate)

dates<-date1 + (ti-1)

tyda<-weekdays(dates)

tymo<-months(dates)

tyyear<-as.character(dates)

for (i in 1:n)

{

  ask<-tyyear[i]

  tyyear[i]<-substr(ask,1,4)

}

 

#replicate cases

air<-data.frame(mi, tyda, tymo, tyyear, pm10, dates)

air$stratn<-as.numeric(strata(air$tyda, air$tymo,air$tyyear))

lest<-unique(air$stratn)

 

air$nctrl<-0

airbase<-air

 

#find the number of controls whithin each stratum

for (i in 1:length(lest))

{

   a<-which(air$stratn==lest[i])

   for (j in 1:length(a))

   {

      air$nctrl[a[j]]<-sum(air$mi[a[-j]])

   }

}

 

 

#create cases and controls

cami<-rep(1,sum(mi))

ctmi<-rep(0,sum(air$nctrl))

 

capm<-rep(pm10, mi)

ctpm<-rep(pm10, air$nctrl)

 

cast<-rep(air$stratn, mi)

ctst<-rep(air$stratn, air$nctrl)

 

cady<-rep(dates, mi)

ctdy<-rep(dates, air$nctrl)

 

cases<-c(cami, ctmi)

stranu<-c(cast, ctst)

days<-c(cady, ctdy)

pmva<-c(capm, ctpm)

 

air2<-data.frame(cases, days, pmva,stranu)

air.cl<-clogit(cases~pmva + strata(stranu), method="approximate",
data=air2)

air.glm<-glm(mi~pm10, family=poisson, data=air)

#air.gam<-gam(mi~s(pm10), family=poisson)

qres[q]<-as.numeric(coef(air.cl))

glmres[q]<-as.numeric(coef(air.glm)[2])

#gamres[q]<-as.numeric(coef(air.gam)[2])

 

}

par(mfrow=c(2,1))

plot(density(qres))

plot(density(glmres))

fivenum(qres)

fivenum(glmres)

 

# Berserk script

# this shows that even for very few cases, my intended way of doing TS
C-C is quite time-consuming.

#

#case<-c(5,0,2,1)

#airq<-c(10,2,3,3)

#

#nctrl<-0*case

#for (i in 1:length(case))

#{

#  nctrl[i]<-sum(case[-i])

#}

#

#airca<-rep(airq,case) 

#airct<-rep(airq,nctrl)

#

#cases<-rep(1,sum(case))

#ctrls<-rep(0,sum(nctrl))

#

#mi<-c(cases,ctrls)

#airq<-c(airca,airct)

#a<-Sys.time()

#air.cl<-clogit(mi~airq)

#b<-Sys.time()

#b-a

#a<-Sys.time()

#air.cl<-clogit(mi~airq, method="approximate")

#b<-Sys.time()

#b-a


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to