Hi Harvey, Its exciting to see people thinking about and looking at ALTREP speedups "in the wild" :). You're absolutely right that pulling out the REAL call will give you a significant speedup, but ALTREP does add a little wrinkle (and a solution to it!). Detailed responses and comments inline:
On Mon, Jan 7, 2019 at 11:58 AM Harvey Smith <harvey13...@gmail.com> wrote: > I believe the performance of isUnsorted() in sort.c could be improved by > calling REAL() once (outside of the for loop), rather than calling it twice > inside the loop. As an aside, it is implemented in the faster way in > doSort() (sort.c line 401). The example below shows the performance > improvement for a vectors of double of moving REAL() outside the for loop. > > <snip> > > In light of ALTREP's inclusion in the R internals its best to avoid asking things for their full data vector when you don't need to. Instead, you can use the ITERATE_BY_REGION macro R (courtesy of Luke, I believe?) provides in <includedir>/R_ext/Itermacros.h. This is particularly true of R's internals, which also preferably won't "explode"/invalidate an ALTREP (which asking for a writable pointer does) when they don't need to. Most internal functions haven't been converted to this yet, as you see with is.unsorted (and its not a high priority to do the conversion until it becomes an issue for any given case), but this is what, e.g., R's own sum function now does. ITERATE_BY_REGION is based on *_GET_REGION, which was added to the C API as part ALTREP, but works on ALTREP and normal vectors, and won't explode in corner cases where materializing a full ALTREP vector would be problematic. The core concept for ITERATE_BY_REGION is to grab regions (a quick glance tells me its 512 elements at a time) of a vector, copying them into a buffer, and using the same trick your code does by avoiding pointer lookup inside the inner tight loop. Do note that as of now I had to compile my function with language="C", rather than the default "C++" to avoid an error about initializing a const double * with a const void * value. On my machine, at least, you actually *nearly* the same speedup with all the added safety. Eyeballing it I'm not convinced the difference is statistically signfiicant, to be honest, but even if it is, you get most of the benefit... body = " R_xlen_t n, i; n = XLENGTH(x); for(i = 0; i+1 < n ; i++) if(REAL(x)[i] > REAL(x)[i+1]) return ScalarLogical(TRUE); return ScalarLogical(FALSE);"; f1 = inline::cfunction(sig = signature(x='numeric'), body=body) body = " R_xlen_t n, i; n = XLENGTH(x); double* real_x = REAL(x); for(i = 0; i+1 < n ; i++) if(real_x[i] > real_x[i+1]) return ScalarLogical(TRUE); return ScalarLogical(FALSE);"; f2 = inline::cfunction(sig = signature(x='numeric'), body=body) body = " double tmp = -DBL_MAX; // minimum possible double value ITERATE_BY_REGION(x, xptr, i, nbatch, double, REAL, { if(xptr[0] < tmp) //deal with batch barriers, tmp is end of last batch return ScalarLogical(TRUE); for(R_xlen_t k = 0; k < nbatch - 1; k++) { if(xptr[k] > xptr[k+1]) return ScalarLogical(TRUE); } tmp = xptr[nbatch - 1]; }); return ScalarLogical(FALSE);"; f3 = inline::cfunction(sig = signature(x='numeric'), body=body, includes = '#include "R_ext/Itermacros.h"', language = "C") x.double = as.double(1:1e7) + 0 x.posixct = Sys.time() + x.double microbenchmark::microbenchmark( f1(x.double), f2(x.double), # one REAL call f3(x.double), # ITERATE_BY_REGION f1(x.posixct), f2(x.posixct), # one REAL call f3(x.posixct), # ITERATE_BY_REGION unit='ms', times=100) Unit: milliseconds expr min lq mean median uq max f1(x.double) 26.377432 27.234192 28.156993 27.774590 28.602643 32.213378 f2(x.double) 4.722712 4.854300 5.011549 4.991388 5.127996 5.523156 f3(x.double) 4.759537 4.788137 5.408925 5.373667 5.713877 6.694330 f1(x.posixct) 77.975030 78.853724 85.867995 82.530822 83.557849 123.546206 f2(x.posixct) 4.637912 4.660033 4.872892 4.750513 4.880569 5.907149 f3(x.posixct) 4.643806 4.665936 5.094212 5.085454 5.384414 5.778274 neval 10 10 10 10 10 10 To be extra careful we can check that we're getting all the edges right just incase, since the code is admittedly harder to follow and a bit more arcane: > x.double2 = x.double > x.double2[512] = x.double[1] #unsorted at end of first batch > stopifnot(f3(x.double2)) > > x.double2a = x.double > x.double2a[513] = x.double[1] #unsorted at beginning of 2nd batch > stopifnot(f3(x.double2a)) > > > ##check edges > x.double3 = x.double > x.double3[length(x.double3)] = x.double3[1] #unsorted at last element > stopifnot(f3(x.double3)) > > x.double4 = x.double > x.double4[1] = x.double[5] #unsorted at first element > stopifnot(f3(x.double4)) > If R-core is interested I'm happy to develop a patch for the isUnsorted workhorse c-function, at least for integers and reals. > I would also like to suggest ALTREP support for POSIXct vectors, which are > interpreted as type REAL in the c code, but do not gain the performance > benefits of real vectors. Sorted vectors of timestamps are important for > joining time series and in calls to findInterval(). > So looking at this, it is because is.object(x.posixct) returns true, which means sort.default does x[order(x, <bla bla bla>)], which ALTREP is not currently (and may not ever be?) smart enough to catch on its own and know is sorted. Its true we could add something after that to wrap it in what is called a wrapper altrep which would know it's sorted, but we don't do that currently now and I'm not sure we actually should in the general case. I'm not convinced its safe to assume an object class' defined ordering will match the ordering of an underlying double/int representation. I believe we ran into something similar with deferred sting conversions from integers (I think, possibly doubles) where the int had sortedness information but that wasn't correct for the *character vector *the ALTREP ultimately represented. Best, ~G PS above timings were in a mildly (~1 month) old version of R-devel: > sessionInfo() R Under development (unstable) (2018-12-11 r75837) Platform: x86_64-apple-darwin17.7.0 (64-bit) Running under: macOS High Sierra 10.13.6 > # unsorted vectors > x.double = as.double(1:1e7) + 0 > x.posixct = Sys.time() + x.double > # sort for altrep benefit > x.double.sort <- sort(x.double) > x.posixct.sort <- sort(x.posixct) > microbenchmark::microbenchmark( > is.unsorted(x.double), > is.unsorted(x.double.sort), # faster due to altrep > is.unsorted(x.posixct), > is.unsorted(x.posixct.sort), # no altrep benefit > unit='ms', times=10) > Unit: milliseconds > expr min lq mean median > uq max neval > is.unsorted(x.double) 16.987730 17.010008 17.1577173 17.0862785 > 17.308674 17.474432 10 > is.unsorted(x.double.sort) 0.000378 0.000756 0.0065327 0.0075525 > 0.010195 0.011706 10 > is.unsorted(x.posixct) 36.925876 37.084837 43.4125593 37.4695915 > 41.858589 78.742174 10 > is.unsorted(x.posixct.sort) 36.966654 37.031975 51.1228686 37.1235380 > 37.777319 153.270170 10 > > > Since there do not appear to be any tests for is.unsorted() these are some > tests to be added for some types. > > # integer sequence > x <- -10L:10L > stopifnot(!is.unsorted(x, na.rm = F, strictly = T)) > stopifnot(!is.unsorted(x, na.rm = F, strictly = F)) > # integer not strictly > x <- -10L:10L > x[2] <- x[3] > stopifnot( is.unsorted(x, na.rm = F, strictly = T)) > stopifnot(!is.unsorted(x, na.rm = F, strictly = F)) > # integer with NA > x <- -10L:10L > x[2] <- NA > stopifnot(!is.unsorted(x, na.rm = T, strictly = F)) > stopifnot(is.na(is.unsorted(x, na.rm = F, strictly = F))) > # double > x <- seq(from = -10, to = 10, by=0.01) > stopifnot(!is.unsorted(x, na.rm = F, strictly = T)) > stopifnot(!is.unsorted(x, na.rm = F, strictly = F)) > # double not strictly > x <- seq(from = -10, to = 10, by=0.01) > x[2] <- x[3] > stopifnot( is.unsorted(x, na.rm = F, strictly = T)) > stopifnot(!is.unsorted(x, na.rm = F, strictly = F)) > # double with NA > x <- seq(from = -10, to = 10, by=0.01) > x[length(x)] <- NA > stopifnot(!is.unsorted(x, na.rm = T, strictly = F)) > stopifnot(is.na(is.unsorted(x, na.rm = F, strictly = F))) > # logical > stopifnot(!is.unsorted( c(F, T, T), strictly = F)) > stopifnot( is.unsorted( c(F, T, T), strictly = T)) > stopifnot( is.unsorted( c(T, T, F), strictly = F)) > stopifnot( is.unsorted( c(T, T, F), strictly = T)) > # POSIXct > x <- seq(from=as.POSIXct('2018-1-1'), to=as.POSIXct('2019-1-1'), by='day') > stopifnot(!is.unsorted(x, na.rm = T, strictly = F)) > stopifnot(!is.unsorted(x, na.rm = F, strictly = F)) > # POSIXct not strictly > x <- seq(from=as.POSIXct('2018-1-1'), to=as.POSIXct('2019-1-1'), by='day') > x[2] <- x[3] > stopifnot( is.unsorted(x, na.rm = F, strictly = T)) > stopifnot(!is.unsorted(x, na.rm = F, strictly = F)) > # POSIXct with NA > x <- seq(from=as.POSIXct('2018-1-1'), to=as.POSIXct('2019-1-1'), by='day') > x[length(x)] <- NA > stopifnot(!is.unsorted(x, na.rm = T, strictly = F)) > stopifnot(is.na(is.unsorted(x, na.rm = F, strictly = F))) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel