I'm all done now. The "max2" version below is what I went with in the end for my proposed change to caret::nearZeroVar (which used the "sort" method). Max Kuhn will make it available on CRAN soon. It speeds up that routine by a factor 2-5 on my test cases and uses much less memory. For what it is worth, I also made a C version ("cmax" below) which of course is faster yet again and scales nicely for returning the top n values of the array:

cmax <- function (v) {max <- vector("double",2); max <- .C("test", as.double(v), as.integer(length(v)), max, NAOK=TRUE)[[3]]; return(max[1]/max[2]);}

library("rbenchmark")
set.seed(1); x <- runif(1e7, max=1e8); x[1] <- NA;
benchmark(
replications=20,
columns=c("test","elapsed"),
order="elapsed"
, sort = {a<-sort(x, decreasing=TRUE, na.last=NA)[1:2]; a[1]/a[2];}
, qsrt = {a<-sort(x, decreasing=TRUE, na.last=NA, method="quick")[1:2]; a[1]/a[2];}
, part = {a<-sort.int(-x, partial=1:2, na.last=NA)[1:2]; a[1]/a[2];}
, max1 = {m<-max(x, na.rm=TRUE); w<-which(x==m)[1]; m/max(x[-w],na.rm=TRUE);}
, max2 = {w<-which.max(x); max(x, na.rm=TRUE)/max(x[-w], na.rm=TRUE);}
, cmax = {cmax(x);}
)
#   test elapsed
# 6 cmax   4.394
# 5 max2   8.954
# 4 max1  18.835
# 3 part  21.749
# 2 qsrt  46.692
# 1 sort  77.679

Thanks for all the suggestions and comments.

Allan.


PS: Slightly off-topic but is there a way within the syntax of R to set up things so that 'sort' (or any function) would know it is called in a partial list context in sort(x)[1:2] and it therefore could choose to use the "partial" argument automatically for small [] lists? The R interpreter of course knows full well that it is going to drop all but the first two values of the result before it calls 'sort'. Perl has 'use Want' where howmany() and want(n) provides a subset of this functionality (essentially for [] lists of the form 1:n).

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to