My comments have nothing to do with speed of your code, but with the correctness.
> > np.lm <-function(dat, X, Y, ...){ > > # Ch 9.2: Slope est. (X) for Thiel statistic > > ... > > num[[i]] <- dat[j.s[i],Y] - dat[i.s[i],Y] > > dom[[i]] <- dat[j.s[i],X] - dat[i.s[i],X] Up to here is looks like X and Y are used to indicate which columns of dat are the predictor and response, respectively. > > .. > > X <- median( sort( do.call(c, num) / do.call(c, dom) ) ) Now X is used to mean the slope of the regression. > > # Ch 9.4: Intercept est. for Thiel statistic > > Intercept <- median(dat[,"Y"] - X*dat[,"X"]) Now the predictor column must be named "X" and the response "Y", no matter what your input X and Y were. Hence the function will fail to give the correct answer in most cases. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > Of Berend Hasselman > Sent: Sunday, October 21, 2012 11:59 AM > To: Brad Schneid > Cc: r-help@r-project.org > Subject: Re: [R] help speeding up simple Theil regression function > > > On 21-10-2012, at 20:06, Brad Schneid wrote: > > > Hello, > > > > I am working on a simple non-parametric (Theil) regression function and and > > am following Hollander and Wolfe 1999 text. I would like some help making > > my function faster. I have compared with pre-packaged version from "MBLM", > > which isnt very fast either, but it appears mine is faster with N = 1000 > > (see results below). I plan on running this function repeatedly, and I > > generally have data lengths of ~ N = 6000 or more. > > > > # My function following Hollander and Wolfe text, Chapter 9 > > np.lm <-function(dat, X, Y, ...){ > > # Ch 9.2: Slope est. (X) for Thiel statistic > > combos <- combn(nrow(dat), 2) > > i.s <- combos[1,] > > j.s <- combos[2,] > > num <- vector("list", length=length(i.s)) > > dom <- vector("list", length=length(i.s)) > > > > for(i in 1:length(i.s)){ > > num[[i]] <- dat[j.s[i],Y] - dat[i.s[i],Y] > > dom[[i]] <- dat[j.s[i],X] - dat[i.s[i],X] > > } > > > > X <- median( sort( do.call(c, num) / do.call(c, dom) ) ) > > # Ch 9.4: Intercept est. for Thiel statistic > > Intercept <- median(dat[,"Y"] - X*dat[,"X"]) > > out <- data.frame(Intercept, X) > > return(out) > > } # usage: np.lm(dat, X=1, Y=2) > > ################################################################ > > > > library("mblm") # I will compare to mblm() function > > > > X <- rnorm(1000) > > Y <- rnorm(1000) > > dat <- data.frame(X, Y) > > > > system.time(np.lm(dat, X=1, Y=2) ) > > user system elapsed > > 118.610 0.130 119.144 > > 109.000 0.040 109.416 # ran it twice > > 86.190 0.100 86.589 # 3rd time > > Alternative function without your i loop (it isn't needed and can be > vectorized): > > np.lm.alt <-function(dat, X, Y, ...){ > # Ch 9.2: Slope est. (X) for Thiel statistic ==> (Pedantic comment: it > is Theil (swap > the i and e) > combos <- combn(nrow(dat), 2) > i.s <- combos[1,] > j.s <- combos[2,] > > Y.num <- dat[j.s,Y] - dat[i.s,Y] > X.dom <- dat[j.s,X] - dat[i.s,X] > X <- median( Y.num / X.dom) > # Ch 9.4: Intercept est. for Thiel statistic ==> (Pedantic comment: it > is Theil (swap > the i and e) > Intercept <- median(dat[,"Y"] - X*dat[,"X"]) > out <- data.frame(Intercept, X) > return(out) > } # usage: np.lm(dat, X=1, Y=2) > > > Try the compiler package on you original function: > > library(compiler) > np.lm.c <- cmpfun(np.lm) > > Test speed and correct results: > > X <- rnorm(500) > Y <- rnorm(500) > dat <- data.frame(X, Y) > > system.time(npout.c <- np.lm.c(dat, X=1, Y=2) ) > system.time(npout.1 <- np.lm(dat, X=1, Y=2) ) > system.time(npout.a <- np.lm.alt(dat, X=1, Y=2) ) > identical(npout.1,npout.c) > identical(npout.1,npout.a) > > Results: > > > system.time(npout.c <- np.lm.c(dat, X=1, Y=2) ) > user system elapsed > 21.442 0.066 21.517 > > system.time(npout.1 <- np.lm(dat, X=1, Y=2) ) > user system elapsed > 21.068 0.073 21.161 > > system.time(npout.a <- np.lm.alt(dat, X=1, Y=2) ) > user system elapsed > 0.303 0.010 0.313 > > identical(npout.1,npout.c) > [1] TRUE > > identical(npout.1,npout.a) > [1] TRUE > > You may try and test this with larger data lengths. > > > Berend > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.