I understand, thank you. I think I wanted the output to look similar to that from mblm for quick comparison; that is obviously a problem.
William Dunlap wrote > My comments have nothing to do with speed of your code, > but with the correctness. > >> > np.lm <-function(dat, X, Y, ...){ >> > # Ch 9.2: Slope est. (X) for Thiel statistic >> > ... >> > num[[i]] <- dat[j.s[i],Y] - dat[i.s[i],Y] >> > dom[[i]] <- dat[j.s[i],X] - dat[i.s[i],X] > > Up to here is looks like X and Y are used to indicate which columns > of dat are the predictor and response, respectively. > >> > .. >> > X <- median( sort( do.call(c, num) / do.call(c, dom) ) ) > > Now X is used to mean the slope of the regression. > >> > # Ch 9.4: Intercept est. for Thiel statistic >> > Intercept <- median(dat[,"Y"] - X*dat[,"X"]) > > Now the predictor column must be named "X" and the response "Y", > no matter what your input X and Y were. Hence the function will fail > to give the correct answer in most cases. > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > >> -----Original Message----- >> From: > r-help-bounces@ > [mailto: > r-help-bounces@ > ] On Behalf >> Of Berend Hasselman >> Sent: Sunday, October 21, 2012 11:59 AM >> To: Brad Schneid >> Cc: > r-help@ >> Subject: Re: [R] help speeding up simple Theil regression function >> >> >> On 21-10-2012, at 20:06, Brad Schneid wrote: >> >> > Hello, >> > >> > I am working on a simple non-parametric (Theil) regression function and >> and >> > am following Hollander and Wolfe 1999 text. I would like some help >> making >> > my function faster. I have compared with pre-packaged version from >> "MBLM", >> > which isnt very fast either, but it appears mine is faster with N = >> 1000 >> > (see results below). I plan on running this function repeatedly, and I >> > generally have data lengths of ~ N = 6000 or more. >> > >> > # My function following Hollander and Wolfe text, Chapter 9 >> > np.lm <-function(dat, X, Y, ...){ >> > # Ch 9.2: Slope est. (X) for Thiel statistic >> > combos <- combn(nrow(dat), 2) >> > i.s <- combos[1,] >> > j.s <- combos[2,] >> > num <- vector("list", length=length(i.s)) >> > dom <- vector("list", length=length(i.s)) >> > >> > for(i in 1:length(i.s)){ >> > num[[i]] <- dat[j.s[i],Y] - dat[i.s[i],Y] >> > dom[[i]] <- dat[j.s[i],X] - dat[i.s[i],X] >> > } >> > >> > X <- median( sort( do.call(c, num) / do.call(c, dom) ) ) >> > # Ch 9.4: Intercept est. for Thiel statistic >> > Intercept <- median(dat[,"Y"] - X*dat[,"X"]) >> > out <- data.frame(Intercept, X) >> > return(out) >> > } # usage: np.lm(dat, X=1, Y=2) >> > ################################################################ >> > >> > library("mblm") # I will compare to mblm() function >> > >> > X <- rnorm(1000) >> > Y <- rnorm(1000) >> > dat <- data.frame(X, Y) >> > >> > system.time(np.lm(dat, X=1, Y=2) ) >> > user system elapsed >> > 118.610 0.130 119.144 >> > 109.000 0.040 109.416 # ran it twice >> > 86.190 0.100 86.589 # 3rd time >> >> Alternative function without your i loop (it isn't needed and can be >> vectorized): >> >> np.lm.alt <-function(dat, X, Y, ...){ >> # Ch 9.2: Slope est. (X) for Thiel statistic ==> (Pedantic comment: it >> is Theil (swap >> the i and e) >> combos <- combn(nrow(dat), 2) >> i.s <- combos[1,] >> j.s <- combos[2,] >> >> Y.num <- dat[j.s,Y] - dat[i.s,Y] >> X.dom <- dat[j.s,X] - dat[i.s,X] >> X <- median( Y.num / X.dom) >> # Ch 9.4: Intercept est. for Thiel statistic ==> (Pedantic comment: it >> is Theil (swap >> the i and e) >> Intercept <- median(dat[,"Y"] - X*dat[,"X"]) >> out <- data.frame(Intercept, X) >> return(out) >> } # usage: np.lm(dat, X=1, Y=2) >> >> >> Try the compiler package on you original function: >> >> library(compiler) >> np.lm.c <- cmpfun(np.lm) >> >> Test speed and correct results: >> >> X <- rnorm(500) >> Y <- rnorm(500) >> dat <- data.frame(X, Y) >> >> system.time(npout.c <- np.lm.c(dat, X=1, Y=2) ) >> system.time(npout.1 <- np.lm(dat, X=1, Y=2) ) >> system.time(npout.a <- np.lm.alt(dat, X=1, Y=2) ) >> identical(npout.1,npout.c) >> identical(npout.1,npout.a) >> >> Results: >> >> > system.time(npout.c <- np.lm.c(dat, X=1, Y=2) ) >> user system elapsed >> 21.442 0.066 21.517 >> > system.time(npout.1 <- np.lm(dat, X=1, Y=2) ) >> user system elapsed >> 21.068 0.073 21.161 >> > system.time(npout.a <- np.lm.alt(dat, X=1, Y=2) ) >> user system elapsed >> 0.303 0.010 0.313 >> > identical(npout.1,npout.c) >> [1] TRUE >> > identical(npout.1,npout.a) >> [1] TRUE >> >> You may try and test this with larger data lengths. >> >> >> Berend >> >> ______________________________________________ >> > R-help@ > mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@ > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://r.789695.n4.nabble.com/help-speeding-up-simple-Theil-regression-function-tp4646923p4646935.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.