On 21-10-2012, at 20:06, Brad Schneid wrote:

> Hello,
> 
> I am working on a simple non-parametric (Theil) regression function and and
> am following Hollander and Wolfe 1999 text.  I would like some help making
> my function faster.  I have compared with pre-packaged version from "MBLM",
> which isnt very fast either, but it appears mine is faster with N = 1000
> (see results below).  I plan on running this function repeatedly, and I
> generally have data lengths of ~ N = 6000 or more.  
> 
> # My function following Hollander and Wolfe text, Chapter 9
> np.lm <-function(dat, X, Y, ...){
>       # Ch 9.2: Slope est. (X) for Thiel statistic
>       combos <- combn(nrow(dat), 2)
>       i.s <- combos[1,] 
>       j.s <- combos[2,] 
>       num <- vector("list", length=length(i.s))
>       dom <- vector("list", length=length(i.s))
> 
>               for(i in 1:length(i.s)){
>                       num[[i]]  <- dat[j.s[i],Y] - dat[i.s[i],Y]    
>                       dom[[i]]  <- dat[j.s[i],X] - dat[i.s[i],X]    
>                               }       
>       
>       X <- median( sort( do.call(c, num) / do.call(c, dom) ) )
>       # Ch 9.4: Intercept est. for Thiel statistic
>       Intercept <- median(dat[,"Y"] - X*dat[,"X"])
>       out <- data.frame(Intercept, X)
>       return(out)
>               }   # usage: np.lm(dat, X=1, Y=2)
> ################################################################
> 
> library("mblm") # I will compare to mblm() function
> 
> X <- rnorm(1000)
> Y <- rnorm(1000)
> dat <- data.frame(X, Y)
> 
> system.time(np.lm(dat, X=1, Y=2) )
>   user  system elapsed 
> 118.610   0.130 119.144 
> 109.000   0.040 109.416 # ran it twice
> 86.190   0.100  86.589 # 3rd time

Alternative function without your i loop (it isn't needed and can be 
vectorized):

np.lm.alt <-function(dat, X, Y, ...){
        # Ch 9.2: Slope est. (X) for Thiel statistic ==> (Pedantic comment: it 
is Theil (swap the i and e)
        combos <- combn(nrow(dat), 2)
        i.s <- combos[1,] 
        j.s <- combos[2,] 
        
        Y.num <- dat[j.s,Y] - dat[i.s,Y]
        X.dom <- dat[j.s,X] - dat[i.s,X]
        X <- median( Y.num / X.dom)
        # Ch 9.4: Intercept est. for Thiel statistic ==> (Pedantic comment: it 
is Theil (swap the i and e)
        Intercept <- median(dat[,"Y"] - X*dat[,"X"])
        out <- data.frame(Intercept, X)
        return(out)
}   # usage: np.lm(dat, X=1, Y=2)


Try the compiler package on you original function:

library(compiler)
np.lm.c <- cmpfun(np.lm)

Test speed and correct results:

X <- rnorm(500)
Y <- rnorm(500)
dat <- data.frame(X, Y)

system.time(npout.c <- np.lm.c(dat, X=1, Y=2) )
system.time(npout.1 <- np.lm(dat, X=1, Y=2) )
system.time(npout.a <- np.lm.alt(dat, X=1, Y=2) )
identical(npout.1,npout.c)      
identical(npout.1,npout.a)

Results:

> system.time(npout.c <- np.lm.c(dat, X=1, Y=2) )
   user  system elapsed 
 21.442   0.066  21.517 
> system.time(npout.1 <- np.lm(dat, X=1, Y=2) )
   user  system elapsed 
 21.068   0.073  21.161 
> system.time(npout.a <- np.lm.alt(dat, X=1, Y=2) )
   user  system elapsed 
  0.303   0.010   0.313 
> identical(npout.1,npout.c)      
[1] TRUE
> identical(npout.1,npout.a)
[1] TRUE

You may try and test this with larger data lengths.


Berend

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to