On 21-10-2012, at 20:06, Brad Schneid wrote: > Hello, > > I am working on a simple non-parametric (Theil) regression function and and > am following Hollander and Wolfe 1999 text. I would like some help making > my function faster. I have compared with pre-packaged version from "MBLM", > which isnt very fast either, but it appears mine is faster with N = 1000 > (see results below). I plan on running this function repeatedly, and I > generally have data lengths of ~ N = 6000 or more. > > # My function following Hollander and Wolfe text, Chapter 9 > np.lm <-function(dat, X, Y, ...){ > # Ch 9.2: Slope est. (X) for Thiel statistic > combos <- combn(nrow(dat), 2) > i.s <- combos[1,] > j.s <- combos[2,] > num <- vector("list", length=length(i.s)) > dom <- vector("list", length=length(i.s)) > > for(i in 1:length(i.s)){ > num[[i]] <- dat[j.s[i],Y] - dat[i.s[i],Y] > dom[[i]] <- dat[j.s[i],X] - dat[i.s[i],X] > } > > X <- median( sort( do.call(c, num) / do.call(c, dom) ) ) > # Ch 9.4: Intercept est. for Thiel statistic > Intercept <- median(dat[,"Y"] - X*dat[,"X"]) > out <- data.frame(Intercept, X) > return(out) > } # usage: np.lm(dat, X=1, Y=2) > ################################################################ > > library("mblm") # I will compare to mblm() function > > X <- rnorm(1000) > Y <- rnorm(1000) > dat <- data.frame(X, Y) > > system.time(np.lm(dat, X=1, Y=2) ) > user system elapsed > 118.610 0.130 119.144 > 109.000 0.040 109.416 # ran it twice > 86.190 0.100 86.589 # 3rd time
Alternative function without your i loop (it isn't needed and can be vectorized): np.lm.alt <-function(dat, X, Y, ...){ # Ch 9.2: Slope est. (X) for Thiel statistic ==> (Pedantic comment: it is Theil (swap the i and e) combos <- combn(nrow(dat), 2) i.s <- combos[1,] j.s <- combos[2,] Y.num <- dat[j.s,Y] - dat[i.s,Y] X.dom <- dat[j.s,X] - dat[i.s,X] X <- median( Y.num / X.dom) # Ch 9.4: Intercept est. for Thiel statistic ==> (Pedantic comment: it is Theil (swap the i and e) Intercept <- median(dat[,"Y"] - X*dat[,"X"]) out <- data.frame(Intercept, X) return(out) } # usage: np.lm(dat, X=1, Y=2) Try the compiler package on you original function: library(compiler) np.lm.c <- cmpfun(np.lm) Test speed and correct results: X <- rnorm(500) Y <- rnorm(500) dat <- data.frame(X, Y) system.time(npout.c <- np.lm.c(dat, X=1, Y=2) ) system.time(npout.1 <- np.lm(dat, X=1, Y=2) ) system.time(npout.a <- np.lm.alt(dat, X=1, Y=2) ) identical(npout.1,npout.c) identical(npout.1,npout.a) Results: > system.time(npout.c <- np.lm.c(dat, X=1, Y=2) ) user system elapsed 21.442 0.066 21.517 > system.time(npout.1 <- np.lm(dat, X=1, Y=2) ) user system elapsed 21.068 0.073 21.161 > system.time(npout.a <- np.lm.alt(dat, X=1, Y=2) ) user system elapsed 0.303 0.010 0.313 > identical(npout.1,npout.c) [1] TRUE > identical(npout.1,npout.a) [1] TRUE You may try and test this with larger data lengths. Berend ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.