[R] Updated package: survival_2.40-1

Therneau, Terry M., Ph.D. Tue, 01 Nov 2016 15:51:10 -0700

Survival version 2.40 has been relased to CRAN. This is a warning that some users may seechanges in results, however.

The heart of the issue can be shown with a simple example. Calculate the following simpleset of intervals:

<<interval1>>=
birth <- as.Date("1973/03/10")
start <- as.Date("1998/09/13") + 1:40
end   <- as.Date("1998/12/03") + rep(1:10, 4)
interval <- (end-start)
table(interval)
51 61 71 81
10 10 10 10
@

Each interval has a different start and end date, but there are only 4 unique intervals,each of which appears 10 times.

Now convert this to an age scale.

<<interval2>>=
start.age <- as.numeric(start-birth)/365.25
end.age   <- as.numeric(end  -birth)/365.25
age.interval <- end.age - start.age
table(match(age.interval, unique(age.interval)))
1 2 3 4 5 6 7 8
9 1 5 5 1 9 7 3
@

There are now eight different age intervals instead of 4, and the 8 unique values appearbetween 1 and 9 times each. Exact results likely will depend on your computer system. Wehave become a victim of round off error.

Some users prefer to use time in days and some prefer time in years, and those latterusers expect, I am sure, survival analysis results to be identical on the two scales.Both the coxph and survfit routines treat tied event times in a special way, and thisroundoff can make actual ties appear as non-tied values, however. Parametric survival suchas \code{survreg} is not affected by this issue.

In survival version 2.40 this issue has been addressed for the coxph and survfit routines;input times are subjected to the same logic found in the all.equal routine in order todetermine actual ties. The upshot is that some users may experience a changed results.

For the following test case cox1 and cox2 are identical coefficients in version 2.40, butdifferent in prior versions.

<<>>=
ndata <- data.frame(id=1:30,
                      birth.dt = rep(as.Date("1953/03/10"), 30),
                      enroll.dt= as.Date("1993/03/10") + 1:30,
                      end.dt   = as.Date("1996/10/21") + 1:30 +
                          rep(1:10, 3),
                      status= rep(0:1, length=30),
                      x = 1:30)
ndata$enroll.age <- with(ndata, as.numeric(enroll.dt - birth.dt))/365.25
ndata$end.age    <- with(ndata, as.numeric(end.dt - birth.dt))/365.25

fudays <- with(ndata, as.numeric(end.dt - enroll.dt))
fuyrs  <- with(ndata, as.numeric(end.age- enroll.age))
cox1 <- coxph(Surv(fudays, status) ~ x, data=ndata)
cox2 <- coxph(Surv(fuyrs,  status) ~ x, data=ndata)
@

This general issue of floating point precision arises often enough in R that is part ofthe frequently asked questions, see FAQ 7.31 on CRAN. The author of the survival routines(me) has always used days as the scalefor analysis -- just by habit, not for any particluarly good reason -- so the issue hadnever appeared in my work nor in the survival package's test suite. Due to user input,this issue had been addressed earlier in the survfit routine, but only when the statusvariable was 0/1, not when it is a factor.

As a final footnote, the simple data set above also gives different results when coded inSAS: I am not alone in overlooking it. As a consequence, the maintainer expects to getnew emails that ``we have found a bug in your code: it gives a different answer thanSAS''. (This is an actual quote.)


Terry Therneau

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Updated package: survival_2.40-1

Reply via email to