[Rd] survival package: Surv handles invalid intervals (start time > stop (PR#14221)
This is a multi-part message in MIME format. --080605010703060205070700 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit In the latest version of the survival package (2.35-8), the Surv function handles invalid intervals (where start time is greater than stop time) by issuing a warning that NAs have been created and then setting the left endpoint of the offending intervals to NA. However, the code that sets the endpoint to NA subsets incorrectly, and in some circumstances an arbitrary selection of intervals will have an endpoint set to NA. For example, for the interval/event data: intervalevent (NA, 10]1 (1, 5]1 (6, 4]1 the appropriate Surv call **should** result in a warning and the left endpoint of the third, invalid interval being set to NA, as here: > Surv(c(NA,1,6),c(10,5,4),c(1,1,1)) [1] (NA,10 ] ( 1, 5 ] (NA, 4 ] Warning message: In Surv(c(NA, 1, 6), c(10, 5, 4), c(1, 1, 1)) : Stop time must be> start time, NA created > However, the Surv call **actually** results in: > Surv(c(NA,1,6), c(10,5,4), c(1,1,1)) [1] (NA,10 ] (NA, 5 ] ( 6, 4 ] Warning message: In Surv(c(NA, 1, 6), c(10, 5, 4), c(1, 1, 1)) : Stop time must be> start time, NA created > Note that the endpoint of the valid, second interval has been set to NA in place of the invalid, third interval. A similar problem exists for type="interval2" type data. The **expected** behavior is: > Surv(c(NA,1,6), c(10,5,4), type="interval2") [1] 10- [ 1, 5] [NA, 4] Warning message: In Surv(c(NA, 1, 6), c(10, 5, 4), type = "interval2") : Invalid interval: start> stop, NA created > but the **actual** behavior is: > Surv(c(NA,1,6), c(10,5,4), type="interval2") [1] 10- [NA, 5] [ 6, 4] Warning message: In Surv(c(NA, 1, 6), c(10, 5, 4), type = "interval2") : Invalid interval: start> stop, NA created > The attached patch fixes the problem. -- Kevin BuhrPhone: +1 608 265 4587 Assistant ScientistFax: +1 608 263 0415 Statistical Data Analysis Center Room 211, WARF Office Building, 610 Walnut St., Madison, WI 53726-2397 --080605010703060205070700 Content-Type: text/x-patch; name="Surv-subset-bug.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="Surv-subset-bug.patch" diff --git a/survival/R/Surv.S b/survival/R/Surv.S index 9257ea0..50b3b85 100644 --- a/survival/R/Surv.S +++ b/survival/R/Surv.S @@ -56,7 +56,7 @@ Surv <- function(time, time2, event, if (!is.numeric(time2)) stop("Stop time is not numeric") who3 <- !(is.na(time) | is.na(time2)) if (any (time[who3]>= time2[who3])) { - time[time[who3]>= time2[who3]] <- NA + time[who3][time[who3]>= time2[who3]] <- NA warning("Stop time must be > start time, NA created") } if (is.logical(event)) status <- as.numeric(event) @@ -105,7 +105,7 @@ Surv <- function(time, time2, event, temp <- (time[status==3] > time2[status==3]) if (any(temp & !is.na(temp))) { - time[temp] <- NA + time[status==3][temp] <- NA warning("Invalid interval: start > stop, NA created") } --080605010703060205070700-- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] O(N log N) impl of Kendall's Tau
I've attached an O(N log N) implementation of Kendall's Tau, which I hope will eventually replace the O(N^2) version currently implemented in R. For N = 30,000 it's several hundred times faster than the old version. The core algorithm comes with a lot of tests, which are included in the kendall.c file. However, I've not yet integrated this code into the rest of R because, honestly, the code in cor.c is inscrutable and intermingles computing Kendall's Tau with dealing with missing values and computing other kinds of correlation. I'd like one of the core devs' help with the integration. The details of the algorithm and how to use it are explained in the comments inside kendall.c. Please let me know what else I can do to help get this improvement folded into R. --David Simcha /* This file contains code to calculate Kendall's Tau in O(N log N) time in * a manner similar to the following reference: * * A Computer Method for Calculating Kendall's Tau with Ungrouped Data * William R. Knight Journal of the American Statistical Association, Vol. 61, * No. 314, Part 1 (Jun., 1966), pp. 436-439 * * Copyright 2010 David Simcha * * License: * Boost Software License - Version 1.0 - August 17th, 2003 * * Permission is hereby granted, free of charge, to any person or organization * obtaining a copy of the software and accompanying documentation covered by * this license (the "Software") to use, reproduce, display, distribute, * execute, and transmit the Software, and to prepare derivative works of the * Software, and to permit third-parties to whom the Software is furnished to * do so, all subject to the following: * * The copyright notices in the Software and this entire statement, including * the above license grant, this restriction and the following disclaimer, * must be included in all copies of the Software, in whole or in part, and * all derivative works of the Software, unless such copies or derivative * works are solely in the form of machine-executable object code generated by * a source language processor. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT * SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE * FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER * DEALINGS IN THE SOFTWARE. * */ #include #include #include #include uint64_t insertionSort(double*, size_t); #define kendallTest #ifdef kendallTest #include #include #include /* Kludge: In testing mode, just forward R_rsort to insertionSort to make this * module testable without having to include (and compile) a bunch of other * stuff. */ void R_rsort(double* arr, int len) { insertionSort(arr, len); } #else #include /* For R_rsort. */ #endif /* Sorts in place, returns the bubble sort distance between the input array * and the sorted array. */ uint64_t insertionSort(double* arr, size_t len) { size_t maxJ, i; uint64_t swapCount = 0; if(len < 2) { return 0; } maxJ = len - 1; for(i = len - 2; i < len; --i) { size_t j = i; double val = arr[i]; for(; j < maxJ && arr[j + 1] < val; ++j) { arr[j] = arr[j + 1]; } arr[j] = val; swapCount += (j - i); } return swapCount; } static uint64_t merge(double* from, double* to, size_t middle, size_t len) { size_t bufIndex, leftLen, rightLen; uint64_t swaps; double* left; double* right; bufIndex = 0; swaps = 0; left = from; right = from + middle; rightLen = len - middle; leftLen = middle; while(leftLen && rightLen) { if(right[0] < left[0]) { to[bufIndex] = right[0]; swaps += leftLen; rightLen--; right++; } else { to[bufIndex] = left[0]; leftLen--; left++; } bufIndex++; } if(leftLen) { memcpy(to + bufIndex, left, leftLen * sizeof(double)); } else if(rightLen) { memcpy(to + bufIndex, right, rightLen * sizeof(double)); } return swaps; } /* Sorts in place, returns the bubble sort distance between the input array * and the sorted array. */ uint64_t mergeSort(double* x, double* buf, size_t len) { uint64_t swaps; size_t half; if(len < 10) { return insertionSort(x, len); } swaps = 0; if(len < 2) { return 0; } half = len / 2; swaps += mergeSort(x, buf, half); swaps += mergeSort(x + half, buf + half, len - half); swaps += merge(x, buf, half, len); memcpy(x, buf, len * sizeof(double)); return swaps; } static uint64_t getMs(double* data, size_t len) { /* Assumes data is sorted.*/
Re: [Rd] [SoC] R and Google Summer of Code 2010: Call for Proposals
> On Tue, 23 Feb 2010 22:27:46 -0600, > Dirk Eddelbuettel (DE) wrote: > On 23 February 2010 at 22:06, hadley wickham wrote: > | > So let us all start now by proposing some ideas for 2010. It may make sense > | > to centralize and standardize this on wiki.r-project.org but until we have > | > that process sorted out lets just post ideas here on r-devel with a [SoC] tag > | > in the Subject: line. > | > | One idea per email? > Sure, works as a start. Or mayb better as one wiki.r-project.org entry per > idea because then we can all edit these rather having to noodle through > emails. But I am unsure where to 'hang this' in the wiki namespace. We > could also create a one-off at wikispaces.com or wikidot.comIf you have > another scratch wiki somewhere I think wiki.r-project.org would be a great place, I'd "hang it" into the developers section. Then we could have a look at GSoC 2010 ideas when GSoC 2011 arrives ... Best, Fritz __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [SoC] R and Google Summer of Code 2010: Call for Proposals
Hello, I have created the "projects" subsection in "developers". The Google Summer of Code 2010 is at http://rwiki.sciviews.org/doku.php?id=developers:projects:googlesummer2010 Best, Philippe ..<°}))>< ) ) ) ) ) ( ( ( ( (Prof. Philippe Grosjean ) ) ) ) ) ( ( ( ( (Numerical Ecology of Aquatic Systems ) ) ) ) ) Mons University, Belgium ( ( ( ( ( .. On 24/02/10 10:50, Friedrich Leisch wrote: On Tue, 23 Feb 2010 22:27:46 -0600, Dirk Eddelbuettel (DE) wrote: > On 23 February 2010 at 22:06, hadley wickham wrote: > |> So let us all start now by proposing some ideas for 2010. It may make sense > |> to centralize and standardize this on wiki.r-project.org but until we have > |> that process sorted out lets just post ideas here on r-devel with a [SoC] tag > |> in the Subject: line. > | > | One idea per email? > Sure, works as a start. Or mayb better as one wiki.r-project.org entry per > idea because then we can all edit these rather having to noodle through > emails. But I am unsure where to 'hang this' in the wiki namespace. We > could also create a one-off at wikispaces.com or wikidot.comIf you have > another scratch wiki somewhere I think wiki.r-project.org would be a great place, I'd "hang it" into the developers section. Then we could have a look at GSoC 2010 ideas when GSoC 2011 arrives ... Best, Fritz __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Typo in the man page of ?cutree (PR#14223)
In the examples section of ?cutree (stats package): hc <- hclust(dist(USArrests)) cutree(hc, k=1:5)#k = 1 is trivial cutree(hc, h=250) ## Compare the 2 and 3 grouping: <<= SHOULD READ AS: "2 and 4 grouping" g24 <- cutree(hc, k = c(2,4)) table(g24[,"2"], g24[,"4"]) __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Suppressing a warning from library()
I get a warning about "1 warning message" using R CMD check on the survival library. It comes from the code snippet: if (library(cmprsk, logical.return=TRUE)) { # further test of competing risks from survfit . . } This is a very useful additional test when I'm checking any changes to the affected code, so I like having this in my test suite even though it doesn't run automatically. This argues for a change in library -- when the user sets logical.return =T they don't need a warning message too. I submit this to the R core and their collective wisdom. I admit that mine is an unusual case, and for now I'll turn if off with options(warn=-1) Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [SoC] R and Google Summer of Code 2010: Call for Proposals
On 24 February 2010 at 11:09, Philippe Grosjean wrote: | I have created the "projects" subsection in "developers". The Google | Summer of Code 2010 is at | http://rwiki.sciviews.org/doku.php?id=developers:projects:googlesummer2010 We dediced to make this http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2010 to stick with the common 'GSoC' acronym. So go ahead and submit ideas! Dirk -- Registration is open for the 2nd International conference R / Finance 2010 See http://www.RinFinance.com for details, and see you in Chicago in April! __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Suppressing a warning from library()
Terry Therneau wrote: I get a warning about "1 warning message" using R CMD check on the survival library. It comes from the code snippet: if (library(cmprsk, logical.return=TRUE)) { # further test of competing risks from survfit . . } This is a very useful additional test when I'm checking any changes to the affected code, so I like having this in my test suite even though it doesn't run automatically. This argues for a change in library -- when the user sets logical.return =T they don't need a warning message too. I submit this to the R core and their collective wisdom. I admit that mine is an unusual case, and for now I'll turn if off with options(warn=-1) Probably, wrapping in suppressWarnings() is better. -p -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] bug in "R Help on 'rmultinom()'" (PR#14222)
Full_Name: Jan Somorcik Version: OS: Submission from: (NULL) (158.195.31.33) The explanation of "rmultinom()" in R Help contains an obvious bug. The original statement "The 'rmultinom()' algorithm draws binomials from Bin(n[j], P[j]) sequentially, where n[1] = N (N := 'size'), P[1] = p[1] (p is 'prob' scaled to sum 1), and for j >= 2, recursively n[j]= N - sum(k=1, .., j-1) n[k] and P[j]= p[j] / (1 - sum(p[1:(j-1)]))." should be corrected, e.g. as follows: "...draws binomials X[j] from.., recursively n[j]= N - sum(k=1, .., j-1) X[k]" __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Suppressing a warning from library()
On Wed, 24 Feb 2010, Terry Therneau wrote: I get a warning about "1 warning message" using R CMD check on the survival library. It comes from the code snippet: if (library(cmprsk, logical.return=TRUE)) { # further test of competing risks from survfit . . } This is a very useful additional test when I'm checking any changes to the affected code, so I like having this in my test suite even though it doesn't run automatically. This argues for a change in library -- when the user sets logical.return =T they don't need a warning message too. It was the considered opinion that in general they do. You can always use suppressWarnings() in your code if you don't, or you can pre-check that the package is available via lower-level functions such as .packages and .find.package. I submit this to the R core and their collective wisdom. I admit that mine is an unusual case, and for now I'll turn if off with options(warn=-1) Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Suppressing a warning from library()
The suppressWarnings construct looks like a good idea; I was not aware of it's existence. Thanks for the feedback. Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] grid unit bug? (PR#14220)
Paul: I figured that would be the problem. I encountered the issue when I tries to check arguments in a validDetails method for a grob. Could one substitute the following function in the grid namespace? is.numeric <- function(x)if(is.unit(x))TRUE else is.numeric(x) (or make the first clause FALSE) Obviously, messing around like this might be dangerous -- or at least would compromise execution speed. Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: paul murrell [mailto:r-b...@r-project.org] Sent: Wednesday, February 24, 2010 4:22 PM To: gunter.ber...@gene.com Subject: Re: grid unit bug? (PR#14220) > The following seems to me to be at least a perverse trap, if not an = > outright > bug: > >> is.numeric(unit(1,"npc")) > [1] TRUE >> is.numeric(1*unit(1,"npc")) > [1] FALSE >> is.numeric(unit(0,"npc") +unit(1,"npc")) > [1] FALSE > > ...etc. > i.e. is.numeric() appears to be TRUE for class "unit" but false for = > class > ("unit.arithmetic" "unit" ). Seems to me it ought to b the same for = > both. These results simply reflect the underlying data structures (simple "unit"s are just numeric vectors, but more complex "unit.arithmetic"s are lists). I don't see how I can "hide" the numeric-ness of simple units (just like there's no way to stop a "ts" object like 'Nile' from satisfying is.numeric()). I could re-implement simple units as lists, but (apart from the work involved) that would be (even) less efficient. 1. Is there a situation where this causes a problem? 2. Do you have a possible "fix" in mind? Paul > > Bert Gunter > Genentech Nonclinical Biostatistics > > (FWIW, I think grid graphics is brilliant!) > > This was R version 2.11.0dev for Windows btw (not that it makes a > difference): > > sessionInfo() > > R version 2.11.0 Under development (unstable) (2010-02-15 r51142)=20 > i386-pc-mingw32=20 > > locale: > [1] LC_COLLATE=3DEnglish_United States.1252=20 > [2] LC_CTYPE=3DEnglish_United States.1252 =20 > [3] LC_MONETARY=3DEnglish_United States.1252 > [4] LC_NUMERIC=3DC =20 > [5] LC_TIME=3DEnglish_United States.1252 =20 > > attached base packages: > [1] datasets splines grid tcltk stats graphics = > grDevices > [8] utils methods base=20 > > other attached packages: > [1] TinnR_1.0.3 R2HTML_1.59-1 Hmisc_3.7-0 survival_2.35-8 > [5] svSocket_0.9-48 lattice_0.18-3 MASS_7.3-5=20 > > loaded via a namespace (and not attached): > [1] cluster_1.12.1 svMisc_0.9-56 > > > > =A0 > =A0 > > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] proto and baseenv()
I understand why the following happens ($.proto delegates to get, which ascends the parent hierarchy up to globalenv()), but I still find it anti-intuitive: > z <- 1 > y <- proto(a=2) > y$z [1] 1 Although this is well-documented behavior; wouldn't it uphold the principle of least surprise to inherit instead from baseenv() or emptyenv()? (See attached patch.) Spurious parent definitions have already been the source of bizarre bugs, prompting me to use proto like this: > z <- 1 > y <- proto(baseenv(), a=2) > y$z Error in get(x, env = this, inherits = inh) : object 'z' not found It's cumbersome, but it ensures that parent definitions don't pollute my object space; I would rather that be the default behaviour, though. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] proto and baseenv()
Wow, thanks for the heads-up. That is horrible behavior. But using baseenv() doesn't seem like the solution either. I'm new to proto, but it seems like this is also a big drawback: > z <- 1 > proto(baseenv(), expr={a=z})$a Error in eval(expr, envir, enclos) : object "z" not found -- Ben - Original message - From: Peter Danenberg To: r-devel@r-project.org Date: Wed, 24 Feb 2010 22:38:54 -0600 I understand why the following happens ($.proto delegates to get, which ascends the parent hierarchy up to globalenv()), but I still find it anti-intuitive: > z <- 1 > y <- proto(a=2) > y$z [1] 1 Although this is well-documented behavior; wouldn't it uphold the principle of least surprise to inherit instead from baseenv() or emptyenv()? (See attached patch.) Spurious parent definitions have already been the source of bizarre bugs, prompting me to use proto like this: > z <- 1 > y <- proto(baseenv(), a=2) > y$z Error in get(x, env = this, inherits = inh) : object 'z' not found It's cumbersome, but it ensures that parent definitions don't pollute my object space; I would rather that be the default behaviour, though. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] grid unit bug? (PR#14220)
is.numeric() is generic. So grid could include is.numeric.unit <- function(x) FALSE and register it in its namespace. Or Bert could define it in his application. On Thu, 25 Feb 2010, gunter.ber...@gene.com wrote: Paul: I figured that would be the problem. I encountered the issue when I tries to check arguments in a validDetails method for a grob. Could one substitute the following function in the grid namespace? is.numeric <- function(x)if(is.unit(x))TRUE else is.numeric(x) (or make the first clause FALSE) Obviously, messing around like this might be dangerous -- or at least would compromise execution speed. Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: paul murrell [mailto:r-b...@r-project.org] Sent: Wednesday, February 24, 2010 4:22 PM To: gunter.ber...@gene.com Subject: Re: grid unit bug? (PR#14220) The following seems to me to be at least a perverse trap, if not an = outright bug: is.numeric(unit(1,"npc")) [1] TRUE is.numeric(1*unit(1,"npc")) [1] FALSE is.numeric(unit(0,"npc") +unit(1,"npc")) [1] FALSE ...etc. i.e. is.numeric() appears to be TRUE for class "unit" but false for = class ("unit.arithmetic" "unit" ). Seems to me it ought to b the same for = both. These results simply reflect the underlying data structures (simple "unit"s are just numeric vectors, but more complex "unit.arithmetic"s are lists). I don't see how I can "hide" the numeric-ness of simple units (just like there's no way to stop a "ts" object like 'Nile' from satisfying is.numeric()). I could re-implement simple units as lists, but (apart from the work involved) that would be (even) less efficient. 1. Is there a situation where this causes a problem? 2. Do you have a possible "fix" in mind? Paul Bert Gunter Genentech Nonclinical Biostatistics (FWIW, I think grid graphics is brilliant!) This was R version 2.11.0dev for Windows btw (not that it makes a difference): sessionInfo() R version 2.11.0 Under development (unstable) (2010-02-15 r51142)=20 i386-pc-mingw32=20 locale: [1] LC_COLLATE=3DEnglish_United States.1252=20 [2] LC_CTYPE=3DEnglish_United States.1252 =20 [3] LC_MONETARY=3DEnglish_United States.1252 [4] LC_NUMERIC=3DC =20 [5] LC_TIME=3DEnglish_United States.1252 =20 attached base packages: [1] datasets splines grid tcltk stats graphics = grDevices [8] utils methods base=20 other attached packages: [1] TinnR_1.0.3 R2HTML_1.59-1 Hmisc_3.7-0 survival_2.35-8 [5] svSocket_0.9-48 lattice_0.18-3 MASS_7.3-5=20 loaded via a namespace (and not attached): [1] cluster_1.12.1 svMisc_0.9-56 =A0 =A0 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel