Thanks to both Brian and Goran for the suggestions. I've now added the following to the code:
# Round the survival times just a little. Otherwise unique() and # table() can give a different list of unique times, and both are # used in the subsequent routines. A test case with times of # sqrt(2)^2 and 2 revealed this issue. Normally double.digits is # 53 in base 2, which leads to 15 decimal digits. digits <- floor((.Machine$double.digits -1) * logb(.Machine$double.base,10)) #base 10 digits Y[,1] <- signif(Y[,1], digits) if (ncol(Y)==3) Y[,2] <- signif(Y[,2],digits) It will migrate to CRAN in due course. Terry T -----Original Message----- From: Göran Broström [mailto:g...@stat.umu.se] Sent: Thursday, July 22, 2010 3:54 PM To: Prof Brian Ripley Cc: Therneau, Terry M., Ph.D.; r-devel@r-project.org Subject: Re: [Rd] Table vs unique On 07/21/2010 03:54 PM, Prof Brian Ripley wrote: > I believe this is a misdiagnosis: the 'rounding' is done by > as.character (as the help for argument 'levels' in ?factor does say) > and ?as.character has a full explanation (and is linked from the > relevant part of ?factor). > > as.numeric(as.character()) should do the trick. Or temp <- signif(temp, digits = 15) which seems to be much faster. Göran B. > > On Wed, 21 Jul 2010, Terry Therneau wrote: > >> A bug in the survival routines was reported to me today. The root cause >> is a difference between table, unique, and sort. >> >>> temp<- rep(c(1, sqrt(2)^2, 2), 1:3) >>> unique(temp) >> [1] 1 2 2 >>> table(temp) >> temp >> 1 2 >> 1 5 >> >> I'm using 2.10 on Linux, the user reported from 2.9 on Windows. >> >> 1. Minor issue: I think the root rounding occurs in factor. I didn't >> see any discussion of this in the help page, perhaps something should be >> added. >> >> 2. The error popped up in summary.survfit but the root cause is an >> inconsistent survfit object. The survfit routine uses sort and unique >> to create the unique survival times and most of the output, but table to >> count them for another component. >> Lumping the two versions of "2.0000...." together is the preferable >> output. I think the best solution will be to preprocess the time >> variable so that the three operators are consistent. >> >> as.numeric(as.character(as.factor(time))) ? >> >> Rather ugly. But most importantly what is a guarranteed construct that >> would ensure consistency? Should we use a rounding level that is more >> or less equivalent to all.equal()? >> >> The solution will have to be incorporated into survfit, coxph, ... >> perhaps a dozen places in the survival suite so I'd like to get it right >> the first time. >> >> Terry T >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel