>>>>> "HenrikB" == Henrik Bengtsson <[EMAIL PROTECTED]> >>>>> on Mon, 4 Aug 2008 21:14:12 -0700 writes:
HenrikB> Hi, HenrikB> I just want to do a follow up this very simple HenrikB> fix/correction/speedup/cleanup of the base::which() function. Here is HenrikB> a diff: HenrikB> diff src/library/base/R/which.R which.R HenrikB> 21c21 HenrikB> < wh <- seq_along(x)[ll <- x & !is.na(x)] HenrikB> --- >> wh <- seq_along(x)[x & !is.na(x)] HenrikB> 25c25 HenrikB> < names(wh) <- names(x)[ll] HenrikB> --- >> names(wh) <- names(x)[wh] HenrikB> FYI, the 'll' variable is not used elsewhere. I've been going through HenrikB> this modifications several times and I cannot see any side effects. HenrikB> Could someone of R core please commit this? I had added your proposition to my version of R-devel in order to commit it, and had wanted to do my own performance tests under different scenarios, but I had forgotten / postponed it. {I have more such things , notably the "help.request() from Kate Mullen -- with quite a few of my own changes, not quite finished ... that will have to wait for after useR!2008 ..} In fact, it seems is pretty obvious that the version with [wh] instead of [ll] should be faster in most cases, and never slower, and so I do commit it now. Thank you Henrik, for the reminder. Martin HenrikB> BTW, when one report diff:s, do you prefer to get it with or without HenrikB> context information, e.g. -C 3? {My exact preference would depend on the size / style of the patch itself. It does not really matter, and as a general rule, I'd personally prefer '-u' (unified diffs which include context)} HenrikB> /Henrik HenrikB> On Fri, Jul 11, 2008 at 8:57 AM, Charles C. Berry <[EMAIL PROTECTED]> wrote: >> On Thu, 10 Jul 2008, Henrik Bengtsson wrote: >> >>> Hi, >>> >>> by replacing 'll' with 'wh' in the source code for base::which() one >>> gets ~20% speed up for *named logical vectors*. >> >> >> The amount of speedup depends on how sparse the TRUE values are. >> >> When the proportion of TRUEs gets small the speedup is more than twofold on >> my macbook. For high proportions of TRUE, the speedup is more like the 20% >> you cite. >> >> HTH, >> >> Chuck >> >>> >>> CURRENT CODE: >>> >>> which <- function(x, arr.ind = FALSE) >>> { >>> if(!is.logical(x)) >>> stop("argument to 'which' is not logical") >>> wh <- seq_along(x)[ll <- x & !is.na(x)] >>> m <- length(wh) >>> dl <- dim(x) >>> if (is.null(dl) || !arr.ind) { >>> names(wh) <- names(x)[ll] >>> } >>> ... >>> wh; >>> } >>> >>> SUGGESTED CODE: (Remove 'll' and use 'wh') >>> >>> which2 <- function(x, arr.ind = FALSE) >>> { >>> if(!is.logical(x)) >>> stop("argument to 'which' is not logical") >>> wh <- seq_along(x)[x & !is.na(x)] >>> m <- length(wh) >>> dl <- dim(x) >>> if (is.null(dl) || !arr.ind) { >>> names(wh) <- names(x)[wh] >>> } >>> ... >>> wh; >>> } >>> >>> That's all. >>> >>> BENCHMARKING: >>> >>> # To measure both in same environment >>> which1 <- base::which; >>> environment(which1) <- globalenv(); # Needed? >>> >>> N <- 1e6; >>> set.seed(0xbeef); >>> x <- sample(c(TRUE, FALSE), size=N, replace=TRUE); >>> names(x) <- seq_along(x); >>> B <- 10; >>> t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); }); >>> t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); }); >>> stopifnot(identical(idxs1, idxs2)); >>> print(t1/t2); >>> # Fair benchmarking >>> t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); }); >>> t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); }); >>> print(t1/t2); >>> ## user system elapsed >>> ## 1.283186 1.052632 1.250000 >>> >>> You get similar results if you put for loop outside the system.time() >>> call (and sum up the timings). >>> >>> Cheers >>> >>> Henrik >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> >> Charles C. Berry (858) 534-2098 >> Dept of Family/Preventive >> Medicine >> E mailto:[EMAIL PROTECTED] UC San Diego >> http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 >> >> >> HenrikB> ______________________________________________ HenrikB> R-devel@r-project.org mailing list HenrikB> https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel