Re: [Rd] Subsetting the "ROW"s of an object

Hervé Pagès Fri, 08 Jun 2018 14:09:00 -0700

The C code for subsetting doesn't need to recycle a logical subscript.
It only needs to walk on it and start again at the beginning of the
vector when it reaches the end. Not exactly the same as detecting the
"take everything along that dimension" situation though.
x[TRUE, TRUE, TRUE] triggers the full subsetting machinery when x[]
and x[ , , ] could (and should) easily avoid it.


H.

On 06/08/2018 01:49 PM, Hadley Wickham wrote:

Hmmm, yes, there must be some special case in the C code to avoid
recycling a length-1 logical vector:

dims <- c(4, 4, 4, 1e5)

arr <- array(rnorm(prod(dims)), dims)
dim(arr)
#> [1]      4      4      4 100000
i <- c(1, 3)

bench::mark(
   arr[i, TRUE, TRUE, TRUE],
   arr[i, , , ]
)[c("expression", "min", "mean", "max")]
#> # A tibble: 2 x 4
#>   expression                    min     mean      max
#>   <chr>                    <bch:tm> <bch:tm> <bch:tm>
#> 1 arr[i, TRUE, TRUE, TRUE]   41.8ms   43.6ms   46.5ms
#> 2 arr[i, , , ]               41.7ms   43.1ms   46.3ms


On Fri, Jun 8, 2018 at 12:31 PM, Berry, Charles <[email protected]> wrote:

On Jun 8, 2018, at 11:52 AM, Hadley Wickham <[email protected]> wrote:

On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles <[email protected]> wrote:

On Jun 8, 2018, at 10:37 AM, Hervé Pagès <[email protected]> wrote:

Also the TRUEs cause problems if some dimensions are 0:

matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]

Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
   (subscript) logical subscript too long


OK. But this is easy enough to handle.


H.

On 06/08/2018 10:29 AM, Hadley Wickham wrote:

I suspect this will have suboptimal performance since the TRUEs will
get recycled. (Maybe there is, or could be, ALTREP, support for
recycling)
Hadley



AFAICS, it is not an issue. Taking

arr <- array(rnorm(2^22),c(2^10,4,4,4))

as a test case

and using a function that will either use the literal code `x[i,,,,drop=FALSE]' 
or `eval(mc)':

subset_ROW4 <-
     function(x, i, useLiteral=FALSE)
{
    literal <- quote(x[i,,,,drop=FALSE])
    mc <- quote(x[i])
    nd <- max(1L, length(dim(x)))
    mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
    mc[["drop"]] <- FALSE
    if (useLiteral)
        eval(literal)
    else
        eval(mc)
}

I get identical times with

system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),TRUE))

and with

system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),FALSE))


I think that's because you used a relatively low precision timing
mechnaism, and included the index generation in the timing. I see:

arr <- array(rnorm(2^22),c(2^10,4,4,4))
i <- seq(1,length = 10, by = 100)

bench::mark(
  arr[i, TRUE, TRUE, TRUE],
  arr[i, , , ]
)
#> # A tibble: 2 x 1
#>   expression        min    mean   median      max  n_gc
#>   <chr>         <bch:t> <bch:t> <bch:tm> <bch:tm> <dbl>
#> 1 arr[i, TRUE,…   7.4µs  10.9µs  10.66µs   1.22ms     2
#> 2 arr[i, , , ]   7.06µs   8.8µs   7.85µs 538.09µs     2

So not a huge difference, but it's there.



Funny. I get similar results to yours above albeit with smaller differences. 
Usually < 5 percent.

But with subset_ROW4 I see no consistent difference.

In this example, it runs faster on average using `eval(mc)' to return the 
result:

arr <- array(rnorm(2^22),c(2^10,4,4,4))
i <- seq(1,length=10,by=100)
bench::mark(subset_ROW4(arr,i,FALSE), subset_ROW4(arr,i,TRUE))[,1:8]

# A tibble: 2 x 8
   expression                      min     mean   median      max `itr/sec` 
mem_alloc  n_gc
   <chr>                      <bch:tm> <bch:tm> <bch:tm> <bch:tm>     <dbl> <bch:byt> 
<dbl>
1 subset_ROW4(arr, i, FALSE)   28.9µs   34.9µs   32.1µs   1.36ms    28686.    
5.05KB     5
2 subset_ROW4(arr, i, TRUE)    28.9µs     35µs   32.4µs 875.11µs    28572.    
5.05KB     5


And on subsequent reps the lead switches back and forth.


Chuck


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [email protected]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Subsetting the "ROW"s of an object

Reply via email to