Re: [R] help speeding up simple Theil regression function

William Dunlap Sun, 21 Oct 2012 12:12:59 -0700

My comments have nothing to do with speed of your code,
but with the correctness.


> > np.lm <-function(dat, X, Y, ...){
> >     # Ch 9.2: Slope est. (X) for Thiel statistic
> > ...
> >                     num[[i]]  <- dat[j.s[i],Y] - dat[i.s[i],Y]
> >                     dom[[i]]  <- dat[j.s[i],X] - dat[i.s[i],X]

Up to here is looks like X and Y are used to indicate which columns
of dat are the predictor and response, respectively.

> > ..
> >     X <- median( sort( do.call(c, num) / do.call(c, dom) ) )

Now X is used to mean the slope of the regression.

> >     # Ch 9.4: Intercept est. for Thiel statistic
> >     Intercept <- median(dat[,"Y"] - X*dat[,"X"])

Now the predictor column must be named "X" and the response "Y",
no matter what your input X and Y were.  Hence the function will fail
to give the correct answer in most cases.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf
> Of Berend Hasselman
> Sent: Sunday, October 21, 2012 11:59 AM
> To: Brad Schneid
> Cc: r-help@r-project.org
> Subject: Re: [R] help speeding up simple Theil regression function
> 
> 
> On 21-10-2012, at 20:06, Brad Schneid wrote:
> 
> > Hello,
> >
> > I am working on a simple non-parametric (Theil) regression function and and
> > am following Hollander and Wolfe 1999 text.  I would like some help making
> > my function faster.  I have compared with pre-packaged version from "MBLM",
> > which isnt very fast either, but it appears mine is faster with N = 1000
> > (see results below).  I plan on running this function repeatedly, and I
> > generally have data lengths of ~ N = 6000 or more.
> >
> > # My function following Hollander and Wolfe text, Chapter 9
> > np.lm <-function(dat, X, Y, ...){
> >     # Ch 9.2: Slope est. (X) for Thiel statistic
> >     combos <- combn(nrow(dat), 2)
> >     i.s <- combos[1,]
> >     j.s <- combos[2,]
> >     num <- vector("list", length=length(i.s))
> >     dom <- vector("list", length=length(i.s))
> >
> >             for(i in 1:length(i.s)){
> >                     num[[i]]  <- dat[j.s[i],Y] - dat[i.s[i],Y]
> >                     dom[[i]]  <- dat[j.s[i],X] - dat[i.s[i],X]
> >                             }
> >
> >     X <- median( sort( do.call(c, num) / do.call(c, dom) ) )
> >     # Ch 9.4: Intercept est. for Thiel statistic
> >     Intercept <- median(dat[,"Y"] - X*dat[,"X"])
> >     out <- data.frame(Intercept, X)
> >     return(out)
> >             }   # usage: np.lm(dat, X=1, Y=2)
> > ################################################################
> >
> > library("mblm") # I will compare to mblm() function
> >
> > X <- rnorm(1000)
> > Y <- rnorm(1000)
> > dat <- data.frame(X, Y)
> >
> > system.time(np.lm(dat, X=1, Y=2) )
> >   user  system elapsed
> > 118.610   0.130 119.144
> > 109.000   0.040 109.416 # ran it twice
> > 86.190   0.100  86.589 # 3rd time
> 
> Alternative function without your i loop (it isn't needed and can be 
> vectorized):
> 
> np.lm.alt <-function(dat, X, Y, ...){
>       # Ch 9.2: Slope est. (X) for Thiel statistic ==> (Pedantic comment: it 
> is Theil (swap
> the i and e)
>       combos <- combn(nrow(dat), 2)
>       i.s <- combos[1,]
>       j.s <- combos[2,]
> 
>       Y.num <- dat[j.s,Y] - dat[i.s,Y]
>       X.dom <- dat[j.s,X] - dat[i.s,X]
>       X <- median( Y.num / X.dom)
>       # Ch 9.4: Intercept est. for Thiel statistic ==> (Pedantic comment: it 
> is Theil (swap
> the i and e)
>       Intercept <- median(dat[,"Y"] - X*dat[,"X"])
>       out <- data.frame(Intercept, X)
>       return(out)
> }   # usage: np.lm(dat, X=1, Y=2)
> 
> 
> Try the compiler package on you original function:
> 
> library(compiler)
> np.lm.c <- cmpfun(np.lm)
> 
> Test speed and correct results:
> 
> X <- rnorm(500)
> Y <- rnorm(500)
> dat <- data.frame(X, Y)
> 
> system.time(npout.c <- np.lm.c(dat, X=1, Y=2) )
> system.time(npout.1 <- np.lm(dat, X=1, Y=2) )
> system.time(npout.a <- np.lm.alt(dat, X=1, Y=2) )
> identical(npout.1,npout.c)
> identical(npout.1,npout.a)
> 
> Results:
> 
> > system.time(npout.c <- np.lm.c(dat, X=1, Y=2) )
>    user  system elapsed
>  21.442   0.066  21.517
> > system.time(npout.1 <- np.lm(dat, X=1, Y=2) )
>    user  system elapsed
>  21.068   0.073  21.161
> > system.time(npout.a <- np.lm.alt(dat, X=1, Y=2) )
>    user  system elapsed
>   0.303   0.010   0.313
> > identical(npout.1,npout.c)
> [1] TRUE
> > identical(npout.1,npout.a)
> [1] TRUE
> 
> You may try and test this with larger data lengths.
> 
> 
> Berend
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help speeding up simple Theil regression function

Reply via email to