Re: [R] results of a survival analysis change when converting the data to counting process format

Ferenci Tamas Fri, 23 Aug 2019 07:40:32 -0700

Thanks for all those comments, and thanks Terry for the repair!

Tamas



2019. augusztus 23., 14:40:05, írtad:


I've spent some time chasing this down, and another error of similar type.   
It's repaired in version 3.0-9  (soon to go to CRAN, or at least that is the 
plan  --- I'm working through one last fail in the checks.) 

For those who want to know more---

When there is start, stop data such as this:  (0, 101]  (101, 123]   (123, 
400], ....  the algorithm used by coxph is to work from largest to smallest 
time.  Every time it hits the start of an interval (400, 123, 101) that 
observation is added into the risk set, and whenever it hits the start of an 
interval the observation is removed (123, 101, 0).    "Add" and "remove" also 
means "add to the current estimate of the running mean and variance" of the 
covariates.   At each event time that current mean and variance are are used to 
update the loglik and its derivatives.

When a subject is diced into little peices of (1,2] (2,3] (3,4], ....  like the 
last fit below, the sheer number of updates to the running mean/variance can 
lead to accumulated round off error.   The code works hard to avoid this, and 
some of that is subtle --- round off error is a tricky business --- and has 
gone through several refinements.    Nevertheless, the result for this data set 
should not have been that far off.  Another user example from about the same 
time was worse so I've been focusing on that one: one of the two splits led to 
an exp() overflow and one didn't, giving results that were completely 
different.  This led to a more careful review and some changes that addressed 
the example below as well.

Terry T.

On 8/23/19 5:00 AM, r-help-requ...@r-project.org wrote:

On 2019-08-18 19:10, Ferenci Tamas wrote:

Dear All,



Consider the following simple example:



library( survival )

data( veteran )



coef( coxph(Surv(time, status) ~ trt + prior + karno, data = veteran) )

          trt        prior        karno

  0.180197194 -0.005550919 -0.033771018



Note that we have neither time-dependent covariates, nor time-varying

coefficients, so the results should be the same if we change to

counting process format, no matter where we cut the times.



That's true if we cut at event times:



veteran2 <- survSplit( Surv(time, status) ~ trt + prior + karno,

                        data = veteran, cut = unique( veteran$time ) )



coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = veteran2 ) )

          trt        prior        karno

  0.180197194 -0.005550919 -0.033771018



But quite interestingly not true, if we cut at every day:



veteran3 <- survSplit( Surv(time, status) ~ trt + prior + karno,

                        data = veteran, cut = 1:max(veteran$time) )



coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = veteran3 ) )

          trt        prior        karno

  0.180197215 -0.005550913 -0.033771016



The difference is not large, but definitely more than just a rounding

error, or something like that.



What's going on? How can the results get wrong, especially by

including more cutpoints?

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] results of a survival analysis change when converting the data to counting process format

Reply via email to