Hi, thanks for this. I'll use "unified" diff next time, i.e.
diff -u <current>.R <new>.R /Henrik On Tue, Aug 5, 2008 at 5:54 AM, Martin Maechler <[EMAIL PROTECTED]> wrote: >>>>>> "HenrikB" == Henrik Bengtsson <[EMAIL PROTECTED]> >>>>>> on Mon, 4 Aug 2008 21:14:12 -0700 writes: > > HenrikB> Hi, > > HenrikB> I just want to do a follow up this very simple > HenrikB> fix/correction/speedup/cleanup of the base::which() function. > Here is > HenrikB> a diff: > > HenrikB> diff src/library/base/R/which.R which.R > HenrikB> 21c21 > HenrikB> < wh <- seq_along(x)[ll <- x & !is.na(x)] > HenrikB> --- > >> wh <- seq_along(x)[x & !is.na(x)] > HenrikB> 25c25 > HenrikB> < names(wh) <- names(x)[ll] > HenrikB> --- > >> names(wh) <- names(x)[wh] > > HenrikB> FYI, the 'll' variable is not used elsewhere. I've been going > through > HenrikB> this modifications several times and I cannot see any side > effects. > > HenrikB> Could someone of R core please commit this? > > I had added your proposition to my version of R-devel in order > to commit it, and had wanted to do my own performance tests > under different scenarios, but I had forgotten / postponed it. > {I have more such things , notably the "help.request() from Kate > Mullen -- with quite a few of my own changes, not quite > finished ... that will have to wait for after useR!2008 ..} > > In fact, it seems is pretty obvious that the version with [wh] > instead of [ll] should be faster in most cases, and never > slower, > and so I do commit it now. > > Thank you Henrik, for the reminder. > > Martin > > HenrikB> BTW, when one report diff:s, do you prefer to get it with or > without > HenrikB> context information, e.g. -C 3? > > {My exact preference would depend on the size / style of the > patch itself. It does not really matter, and as a general rule, > I'd personally prefer '-u' (unified diffs which include context)} > > HenrikB> /Henrik > > HenrikB> On Fri, Jul 11, 2008 at 8:57 AM, Charles C. Berry <[EMAIL > PROTECTED]> wrote: > >> On Thu, 10 Jul 2008, Henrik Bengtsson wrote: > >> > >>> Hi, > >>> > >>> by replacing 'll' with 'wh' in the source code for base::which() one > >>> gets ~20% speed up for *named logical vectors*. > >> > >> > >> The amount of speedup depends on how sparse the TRUE values are. > >> > >> When the proportion of TRUEs gets small the speedup is more than > twofold on > >> my macbook. For high proportions of TRUE, the speedup is more like the > 20% > >> you cite. > >> > >> HTH, > >> > >> Chuck > >> > >>> > >>> CURRENT CODE: > >>> > >>> which <- function(x, arr.ind = FALSE) > >>> { > >>> if(!is.logical(x)) > >>> stop("argument to 'which' is not logical") > >>> wh <- seq_along(x)[ll <- x & !is.na(x)] > >>> m <- length(wh) > >>> dl <- dim(x) > >>> if (is.null(dl) || !arr.ind) { > >>> names(wh) <- names(x)[ll] > >>> } > >>> ... > >>> wh; > >>> } > >>> > >>> SUGGESTED CODE: (Remove 'll' and use 'wh') > >>> > >>> which2 <- function(x, arr.ind = FALSE) > >>> { > >>> if(!is.logical(x)) > >>> stop("argument to 'which' is not logical") > >>> wh <- seq_along(x)[x & !is.na(x)] > >>> m <- length(wh) > >>> dl <- dim(x) > >>> if (is.null(dl) || !arr.ind) { > >>> names(wh) <- names(x)[wh] > >>> } > >>> ... > >>> wh; > >>> } > >>> > >>> That's all. > >>> > >>> BENCHMARKING: > >>> > >>> # To measure both in same environment > >>> which1 <- base::which; > >>> environment(which1) <- globalenv(); # Needed? > >>> > >>> N <- 1e6; > >>> set.seed(0xbeef); > >>> x <- sample(c(TRUE, FALSE), size=N, replace=TRUE); > >>> names(x) <- seq_along(x); > >>> B <- 10; > >>> t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); }); > >>> t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); }); > >>> stopifnot(identical(idxs1, idxs2)); > >>> print(t1/t2); > >>> # Fair benchmarking > >>> t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); }); > >>> t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); }); > >>> print(t1/t2); > >>> ## user system elapsed > >>> ## 1.283186 1.052632 1.250000 > >>> > >>> You get similar results if you put for loop outside the system.time() > >>> call (and sum up the timings). > >>> > >>> Cheers > >>> > >>> Henrik > >>> > >>> ______________________________________________ > >>> R-devel@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-devel > >>> > >> > >> Charles C. Berry (858) 534-2098 > >> Dept of Family/Preventive > >> Medicine > >> E mailto:[EMAIL PROTECTED] UC San Diego > >> http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego > 92093-0901 > >> > >> > >> > > HenrikB> ______________________________________________ > HenrikB> R-devel@r-project.org mailing list > HenrikB> https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel