Re: [Rd] unsorted - suggestion for performance improvement and ALTREP support for POSIXct

Gabriel Becker Mon, 07 Jan 2019 17:18:27 -0800

Hi Harvey,

Its exciting to see people thinking about and looking at ALTREP speedups
"in the wild" :).   You're absolutely right that pulling out the REAL call
will give you a significant speedup, but ALTREP does add a little wrinkle
(and a solution to it!). Detailed responses and comments inline:


On Mon, Jan 7, 2019 at 11:58 AM Harvey Smith <harvey13...@gmail.com> wrote:

> I believe the performance of isUnsorted() in sort.c could be improved by
> calling REAL() once (outside of the for loop), rather than calling it twice
> inside the loop.   As an aside, it is implemented in the faster way in
> doSort() (sort.c line 401).  The example below shows the performance
> improvement for a vectors of double of moving REAL() outside the for loop.
>
> <snip>
>
>
In light of ALTREP's inclusion in the R internals its best to avoid asking
things for their full data vector when you don't need to. Instead, you can
use the ITERATE_BY_REGION macro R (courtesy of Luke, I believe?) provides
in <includedir>/R_ext/Itermacros.h. This is particularly true of R's
internals, which also preferably won't  "explode"/invalidate an ALTREP
(which asking for a writable pointer does) when they don't need to. Most
internal functions haven't been converted to this yet, as you see with
is.unsorted (and its not a high priority to do the conversion until it
becomes an issue for any given case), but this is what, e.g., R's own sum
function now does.

ITERATE_BY_REGION is based on *_GET_REGION, which was added to the C API
as part ALTREP, but works on ALTREP and normal vectors, and won't explode
in corner cases where materializing a full ALTREP vector would be
problematic. The core concept for ITERATE_BY_REGION is to grab regions (a
quick glance tells me its 512 elements at a time) of a vector, copying them
into a buffer, and using the same trick your code does by avoiding pointer
lookup inside the inner tight loop. Do note that as of now I had to compile
my function with language="C", rather than the default "C++" to avoid an
error about initializing a const double * with a const void * value.

On my machine, at least, you actually *nearly* the same speedup with all
the added safety. Eyeballing it I'm not convinced the difference is
statistically signfiicant, to be honest, but even if it is, you get most of
the benefit...


body = "
R_xlen_t n, i;
n = XLENGTH(x);
for(i = 0; i+1 < n ; i++)
  if(REAL(x)[i] > REAL(x)[i+1])
    return ScalarLogical(TRUE);
return ScalarLogical(FALSE);";
f1 = inline::cfunction(sig = signature(x='numeric'), body=body)

body = "
R_xlen_t n, i;
n = XLENGTH(x);
double* real_x = REAL(x);
for(i = 0; i+1 < n ; i++)
  if(real_x[i] > real_x[i+1])
    return ScalarLogical(TRUE);
return ScalarLogical(FALSE);";
f2 = inline::cfunction(sig = signature(x='numeric'), body=body)

body = "
double tmp = -DBL_MAX; // minimum possible double value
ITERATE_BY_REGION(x, xptr, i, nbatch, double, REAL, {
  if(xptr[0] < tmp) //deal with batch barriers, tmp is end of last batch
    return ScalarLogical(TRUE);
  for(R_xlen_t k = 0; k < nbatch - 1; k++) {
  if(xptr[k] > xptr[k+1])
    return ScalarLogical(TRUE);
  }
  tmp = xptr[nbatch - 1];
});
return ScalarLogical(FALSE);";
f3 = inline::cfunction(sig = signature(x='numeric'), body=body, includes =
'#include "R_ext/Itermacros.h"',
                       language = "C")

x.double = as.double(1:1e7) + 0
x.posixct = Sys.time() + x.double
microbenchmark::microbenchmark(
                    f1(x.double),
                    f2(x.double), # one REAL call
                    f3(x.double),  # ITERATE_BY_REGION
                    f1(x.posixct),
                    f2(x.posixct), # one REAL call
                    f3(x.posixct), # ITERATE_BY_REGION
                    unit='ms', times=100)



Unit: milliseconds

          expr       min        lq      mean    median        uq        max

  f1(x.double) 26.377432 27.234192 28.156993 27.774590 28.602643  32.213378

  f2(x.double)  4.722712  4.854300  5.011549  4.991388  5.127996   5.523156

  f3(x.double)  4.759537  4.788137  5.408925  5.373667  5.713877   6.694330

 f1(x.posixct) 77.975030 78.853724 85.867995 82.530822 83.557849 123.546206

 f2(x.posixct)  4.637912  4.660033  4.872892  4.750513  4.880569   5.907149

 f3(x.posixct)  4.643806  4.665936  5.094212  5.085454  5.384414   5.778274

 neval

    10

    10

    10

    10

    10

    10



To be extra careful we can check that we're getting all the edges right
just incase, since the code is admittedly harder to follow and a bit more
arcane:

> x.double2 = x.double

> x.double2[512] = x.double[1] #unsorted at end of first batch

> stopifnot(f3(x.double2))

>

> x.double2a = x.double

> x.double2a[513] = x.double[1] #unsorted at beginning of 2nd batch

> stopifnot(f3(x.double2a))

>

>

> ##check edges

> x.double3 = x.double

> x.double3[length(x.double3)] = x.double3[1] #unsorted at last element

> stopifnot(f3(x.double3))

>

> x.double4 = x.double

> x.double4[1] = x.double[5] #unsorted at first element

> stopifnot(f3(x.double4))
>



If R-core is interested I'm happy to develop a patch for the isUnsorted
workhorse c-function, at least for integers and reals.


> I would also like to suggest ALTREP support for POSIXct vectors, which are
> interpreted as type REAL in the c code, but do not gain the performance
> benefits of real vectors.  Sorted vectors of timestamps are important for
> joining time series and in calls to findInterval().
>

So looking at this, it is because is.object(x.posixct) returns true, which
means sort.default does x[order(x, <bla bla bla>)], which ALTREP is not
currently (and may not ever be?) smart enough to catch on its own and know
is sorted.

Its true we could add something after that to wrap it in what is called a
wrapper altrep which would know it's sorted, but we don't do that currently
now and I'm not sure we actually should in the general case. I'm not
convinced its safe to assume an object class' defined ordering will match
the ordering of an underlying double/int representation. I believe we ran
into something similar with deferred sting conversions from integers (I
think, possibly doubles) where the int had sortedness information but that
wasn't correct for the *character vector *the ALTREP ultimately represented.


Best,
~G


PS above timings were in a mildly (~1 month) old version of R-devel:

> sessionInfo()

R Under development (unstable) (2018-12-11 r75837)

Platform: x86_64-apple-darwin17.7.0 (64-bit)

Running under: macOS High Sierra 10.13.6



> # unsorted vectors
> x.double = as.double(1:1e7) + 0
> x.posixct = Sys.time() + x.double
> # sort for altrep benefit
> x.double.sort <- sort(x.double)
> x.posixct.sort <- sort(x.posixct)
> microbenchmark::microbenchmark(
>   is.unsorted(x.double),
>   is.unsorted(x.double.sort), # faster due to altrep
>   is.unsorted(x.posixct),
>   is.unsorted(x.posixct.sort), # no altrep benefit
>   unit='ms', times=10)
> Unit: milliseconds
>                         expr       min        lq       mean     median
>    uq        max neval
>        is.unsorted(x.double) 16.987730 17.010008 17.1577173 17.0862785
> 17.308674  17.474432    10
>   is.unsorted(x.double.sort)  0.000378  0.000756  0.0065327  0.0075525
>  0.010195   0.011706    10
>       is.unsorted(x.posixct) 36.925876 37.084837 43.4125593 37.4695915
> 41.858589  78.742174    10
>  is.unsorted(x.posixct.sort) 36.966654 37.031975 51.1228686 37.1235380
> 37.777319 153.270170    10
>
>
> Since there do not appear to be any tests for is.unsorted() these are some
> tests to be added for some types.
>
> # integer sequence
> x <- -10L:10L
> stopifnot(!is.unsorted(x, na.rm = F, strictly = T))
> stopifnot(!is.unsorted(x, na.rm = F, strictly = F))
> # integer not strictly
> x <- -10L:10L
> x[2] <- x[3]
> stopifnot( is.unsorted(x, na.rm = F, strictly = T))
> stopifnot(!is.unsorted(x, na.rm = F, strictly = F))
> # integer with NA
> x <- -10L:10L
> x[2] <- NA
> stopifnot(!is.unsorted(x, na.rm = T, strictly = F))
> stopifnot(is.na(is.unsorted(x, na.rm = F, strictly = F)))
> # double
> x <- seq(from = -10, to = 10, by=0.01)
> stopifnot(!is.unsorted(x, na.rm = F, strictly = T))
> stopifnot(!is.unsorted(x, na.rm = F, strictly = F))
> # double not strictly
> x <- seq(from = -10, to = 10, by=0.01)
> x[2] <- x[3]
> stopifnot( is.unsorted(x, na.rm = F, strictly = T))
> stopifnot(!is.unsorted(x, na.rm = F, strictly = F))
> # double with NA
> x <- seq(from = -10, to = 10, by=0.01)
> x[length(x)] <- NA
> stopifnot(!is.unsorted(x, na.rm = T, strictly = F))
> stopifnot(is.na(is.unsorted(x, na.rm = F, strictly = F)))
> # logical
> stopifnot(!is.unsorted( c(F, T, T), strictly = F))
> stopifnot( is.unsorted( c(F, T, T), strictly = T))
> stopifnot( is.unsorted( c(T, T, F), strictly = F))
> stopifnot( is.unsorted( c(T, T, F), strictly = T))
> # POSIXct
> x <- seq(from=as.POSIXct('2018-1-1'), to=as.POSIXct('2019-1-1'), by='day')
> stopifnot(!is.unsorted(x, na.rm = T, strictly = F))
> stopifnot(!is.unsorted(x, na.rm = F, strictly = F))
> # POSIXct not strictly
> x <- seq(from=as.POSIXct('2018-1-1'), to=as.POSIXct('2019-1-1'), by='day')
> x[2] <- x[3]
> stopifnot( is.unsorted(x, na.rm = F, strictly = T))
> stopifnot(!is.unsorted(x, na.rm = F, strictly = F))
> # POSIXct with NA
> x <- seq(from=as.POSIXct('2018-1-1'), to=as.POSIXct('2019-1-1'), by='day')
> x[length(x)] <- NA
> stopifnot(!is.unsorted(x, na.rm = T, strictly = F))
> stopifnot(is.na(is.unsorted(x, na.rm = F, strictly = F)))
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] unsorted - suggestion for performance improvement and ALTREP support for POSIXct

Reply via email to