On Oct 5, 2012, at 8:48 PM, Omar De la Cruz C. wrote: > Hello, > > I am interested in producing the expected number of events, in a > recurring events setting. I am using the Andersen-Gill model, as fit > by the function "coxph" in the package "survival." > > I need to produce expected numbers of events for a cohort, > cumulatively, at several fixed times. My ultimate goal is: To fit an > AG model to a reference sample, then use that fitted model to generate > expected numbers of events for a new cohort; then, comparing the > expected vs. the observed numbers of events would give us some idea of > whether the new cohort differs from the reference one. > >> From my reading of the documentation and the text by Therneau and > Grambsch, it seems that the function "survexp" is what I need. But > using it I am not able to obtain expected numbers of events that match > reasonably well the observed numbers *even for the same reference > population.* So, I think I am misunderstanding something quite badly. > > Below is an example that illustrates the situation. At the end I > include the sessionInfo(). > > Thank you! > > Omar. > > > ############################ > > # Example of unexpected behavior in computing estimated number of events > # in using package "survival" for fitting the Andersen-Gill model > > require(survival) > > head(bladder2) # this is the data, in interval format > > # Fit Andersen-Gill model > cphfit = > coxph(Surv(start,stop,event)~rx+number+size+cluster(id),data=bladder2) > > # Choose some arbitrary time horizons > t.horiz = seq(min(bladder2$start),max(bladder2$stop),length=6) > > # Compute the cohort expected survival > s = survexp(~1,data=bladder2,ratetable=cphfit,times=t.horiz) > # This are the expected survival values: > s$surv > > # We are interested in the rate of events > e.r = as.vector( 1 - s$surv ) >
Rates are events/n-exposed/time, so those are not rates as I understand the term. And I do not see any accounting for the length of intervals at risk in the rest of your code. That vector does not even calculate interval event expectations as I read it. -- David > # How does this compare to the actual number of events, cumulative at > # each time horizon? > > observed = numeric(length(t.horiz)) > > for (i in 1:length(t.horiz)){ > > observed[i] = sum(bladder2$event[bladder2$stop <= t.horiz[i]]) > > } > > print(observed) > > # We would like to compute expected numbers of events that approximately > # match these observed values. > > # We should multiply the expected survival rates by the number of individuals. > > # Now, one would think that this is the number of at-risk individuals: > s$n.risk > > # But that is actually the total number of rows in the data. In any case, > # these numbers do not match: > > rbind(expected = s$n.risk*e.r,observed=observed) > > # What if we multiply by the number of individuals? > > rbind(expected = length(unique(bladder2$id))*e.r,observed=observed) > > # This does not work either! The required factor seems to be about 133, but > # I don't see an explanation for that. > > # In this example, multiplying by 133.182 gives a good match between observed > # and expected values, but in other examples even the shape of the curves > # are different. > > # Multiplying by a number of individuals at risk at each time point > # (number of individuals > # for which there is a time interval containing the time horizon) does > # not work either. > > ##################### > >> sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: i386-apple-darwin9.8.0/i386 (32-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] splines stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] survival_2.36-14 > > loaded via a namespace (and not attached): > [1] tools_2.15.1 > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.