Hmmm, yes, there must be some special case in the C code to avoid recycling a length-1 logical vector:
dims <- c(4, 4, 4, 1e5) arr <- array(rnorm(prod(dims)), dims) dim(arr) #> [1] 4 4 4 100000 i <- c(1, 3) bench::mark( arr[i, TRUE, TRUE, TRUE], arr[i, , , ] )[c("expression", "min", "mean", "max")] #> # A tibble: 2 x 4 #> expression min mean max #> <chr> <bch:tm> <bch:tm> <bch:tm> #> 1 arr[i, TRUE, TRUE, TRUE] 41.8ms 43.6ms 46.5ms #> 2 arr[i, , , ] 41.7ms 43.1ms 46.3ms On Fri, Jun 8, 2018 at 12:31 PM, Berry, Charles <ccbe...@ucsd.edu> wrote: > > >> On Jun 8, 2018, at 11:52 AM, Hadley Wickham <h.wick...@gmail.com> wrote: >> >> On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles <ccbe...@ucsd.edu> wrote: >>> >>> >>>> On Jun 8, 2018, at 10:37 AM, Hervé Pagès <hpa...@fredhutch.org> wrote: >>>> >>>> Also the TRUEs cause problems if some dimensions are 0: >>>> >>>>> matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE] >>>> Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] : >>>> (subscript) logical subscript too long >>> >>> OK. But this is easy enough to handle. >>> >>>> >>>> H. >>>> >>>> On 06/08/2018 10:29 AM, Hadley Wickham wrote: >>>>> I suspect this will have suboptimal performance since the TRUEs will >>>>> get recycled. (Maybe there is, or could be, ALTREP, support for >>>>> recycling) >>>>> Hadley >>> >>> >>> AFAICS, it is not an issue. Taking >>> >>> arr <- array(rnorm(2^22),c(2^10,4,4,4)) >>> >>> as a test case >>> >>> and using a function that will either use the literal code >>> `x[i,,,,drop=FALSE]' or `eval(mc)': >>> >>> subset_ROW4 <- >>> function(x, i, useLiteral=FALSE) >>> { >>> literal <- quote(x[i,,,,drop=FALSE]) >>> mc <- quote(x[i]) >>> nd <- max(1L, length(dim(x))) >>> mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L) >>> mc[["drop"]] <- FALSE >>> if (useLiteral) >>> eval(literal) >>> else >>> eval(mc) >>> } >>> >>> I get identical times with >>> >>> system.time(for (i in 1:10000) >>> subset_ROW4(arr,seq(1,length=10,by=100),TRUE)) >>> >>> and with >>> >>> system.time(for (i in 1:10000) >>> subset_ROW4(arr,seq(1,length=10,by=100),FALSE)) >> >> I think that's because you used a relatively low precision timing >> mechnaism, and included the index generation in the timing. I see: >> >> arr <- array(rnorm(2^22),c(2^10,4,4,4)) >> i <- seq(1,length = 10, by = 100) >> >> bench::mark( >> arr[i, TRUE, TRUE, TRUE], >> arr[i, , , ] >> ) >> #> # A tibble: 2 x 1 >> #> expression min mean median max n_gc >> #> <chr> <bch:t> <bch:t> <bch:tm> <bch:tm> <dbl> >> #> 1 arr[i, TRUE,… 7.4µs 10.9µs 10.66µs 1.22ms 2 >> #> 2 arr[i, , , ] 7.06µs 8.8µs 7.85µs 538.09µs 2 >> >> So not a huge difference, but it's there. > > > Funny. I get similar results to yours above albeit with smaller differences. > Usually < 5 percent. > > But with subset_ROW4 I see no consistent difference. > > In this example, it runs faster on average using `eval(mc)' to return the > result: > >> arr <- array(rnorm(2^22),c(2^10,4,4,4)) >> i <- seq(1,length=10,by=100) >> bench::mark(subset_ROW4(arr,i,FALSE), subset_ROW4(arr,i,TRUE))[,1:8] > # A tibble: 2 x 8 > expression min mean median max `itr/sec` > mem_alloc n_gc > <chr> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl> > <bch:byt> <dbl> > 1 subset_ROW4(arr, i, FALSE) 28.9µs 34.9µs 32.1µs 1.36ms 28686. > 5.05KB 5 > 2 subset_ROW4(arr, i, TRUE) 28.9µs 35µs 32.4µs 875.11µs 28572. > 5.05KB 5 >> > > And on subsequent reps the lead switches back and forth. > > > Chuck > -- http://hadley.nz ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel