I have a follow-up question (for either Dr. Therneau, or anybody who might know).
sum(zz) (see below) estimates the number of events for the cohort. Now, how can I compute a confidence interval for sum(zz)? Or a standard error? My obvious choice, square root of the sum of the squares of the standard errors for zz provided by predict.coxph, turns out to be too small. Thank you for any suggestions. Omar. On Thu, Oct 11, 2012 at 1:26 PM, Omar De la Cruz C. <odelacr...@gmail.com> wrote: > Thank you, Dr. Therneau, that was very helpful. > > Best regards, > > Omar. > > > On Mon, Oct 8, 2012 at 9:58 AM, Terry Therneau <thern...@mayo.edu> wrote: >> >>> I am interested in producing the expected number of events, in a >>> recurring events setting. I am using the Andersen-Gill model, as fit >>> by the function "coxph" in the package "survival." >>> >>> I need to produce expected numbers of events for a cohort, >>> cumulatively, at several fixed times. My ultimate goal is: To fit an >>> AG model to a reference sample, then use that fitted model to generate >>> expected numbers of events for a new cohort; then, comparing the >>> expected vs. the observed numbers of events would give us some idea of >>> whether the new cohort differs from the reference one. >>> >>>> From my reading of the documentation and the text by Therneau and >>> >>> Grambsch, it seems that the function "survexp" is what I need. But >>> using it I am not able to obtain expected numbers of events that match >>> reasonably well the observed numbers *even for the same reference >>> population.* So, I think I am misunderstanding something quite badly. >>> >> >> You've hit a common confusion. Observed versus expected events >> computations are done on a cumulative hazard scale H, not the surivival >> scale S; S = exp(-H). Relating this back to simple Poisson models H(t) >> would be the expected number of events by time t and S(t) the probability of >> "no events before time t". G. Berry (Biometrics 1983) has a classic ane >> readable article on this (especially if you ignore the proofs). >> >> Using your example: >> >>> cphfit <- >>> coxph(Surv(start,stop,event)~rx+number+size+cluster(id),data=bladder2) >>> zz <- predict(cphfit, type='expected') >>> c(sum(zz), sum(bladder2$event)) >> [1] 112 112 >> >>> tdata <- bladder2[1:10] #new data set (lazy way) >>> predict(cphfit, type='expected', newdata=tdata) >> [1] 0.0324089 0.3226540 0.4213402 1.0560768 0.6702130 0.2163531 0.6490665 >> [8] 0.8864808 0.2932915 0.5190647 >> >> >> You can also do this using survexp and the cohort=FALSE argument, which >> would return S(t) for each subject and we would then use -log(result) to get >> H. This is how it was done when I wrote the book, but the newer predict >> function is easier. >> >> Terry Therneau >> >> ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.