Re: [R] Python and R

Kenn Konstabel Fri, 20 Feb 2009 03:51:50 -0800

Decyphering formulas seems to be the most time consuming part of lm:

mylm1 <- function(formula, data) {
    # not perfect but works
    F <- model.frame(formula,data)
    y <- model.response(F)
    mt <- attr(F, "terms")
    x <- model.matrix(mt,F)
    coefs <- solve(crossprod(x), crossprod(x,y))
    coefs
    }


mylm2 <- function(x, y, intercept=TRUE) {
    if(!is.matrix(x)) x <- as.matrix(x)
    if(intercept) x <- cbind(1,x)
    if(!is.matrix(y)) y <- as.matrix(y)
    solve(crossprod(x), crossprod(x,y))
    }

> system.time(for(i in 1:1000) mylm2(EuStockMarkets[,-1],
EuStockMarkets[,"DAX"]))
   user  system elapsed
   6.43    0.00    6.53
> system.time(for(i in 1:1000) mylm1(DAX~., EuStockMarkets))
   user  system elapsed
  16.19    0.00   16.23
> system.time(for(i in 1:1000) lm(DAX~., EuStockMarkets))
   user  system elapsed
  21.43    0.00   21.44

So if you need to save time, I'd suggest something close to mylm2 rather
than mylm1.

Kenn



On Thu, Feb 19, 2009 at 8:04 PM, Gabor Grothendieck <ggrothendi...@gmail.com
> wrote:

> On Thu, Feb 19, 2009 at 8:30 AM, Esmail Bonakdarian <esmail...@gmail.com>
> wrote:
>
> Thanks for the suggestions, I'll have to see if I can figure out how to
> > convert the relatively simple call to lm with an equation and the data
> file
> > to the functions you mention (or if that's even feasible).
>
> X <- model.matrix(formula, data)
>
> will calculate the X matrix for you.
>
> >
> > Not an expert in statistics myself, I am mostly concentrating on the
> > programming aspects of R. Problem is that I suspect my colleagues who
> > are providing some guidance with the stats end are not quite experts
> > themselves, and certainly new to R.
> >
> > Cheers,
> >
> > Esmail
> >
> > Kenn Konstabel wrote:
> >>
> >> lm does lots of computations, some of which you may never need. If speed
> >> really matters, you might want to compute only those things you will
> really
> >> use. If you only need coefficients, then using %*%, solve and crossprod
> will
> >> be remarkably faster than lm
> >>
> >> # repeating someone else's example
> >> # lm(DAX~., EuStockMarkets)
> >>
> >>  y <- EuStockMarkets[,"DAX"]
> >>  x <- EuStockMarkets
> >>  x[,1]<-1
> >> colnames(x)[1] <- "Intercept"
> >>
> >> lm(y ~ x-1)
> >> solve(crossprod(x), t(x))%*%y    # probably this can be done more
> >> efficiently
> >>
> >> # and a naive timing
> >>
> >>  > system.time( for(i in 1:1000) lm(y ~ x-1))
> >>   user  system elapsed
> >>  14.64    0.33   32.69
> >>  > system.time(for(i in 1:1000) solve(crossprod(x), crossprod(x,y)) )
> >>   user  system elapsed
> >>   0.36    0.00    0.36
> >>
> >>
> >> Also lsfit() is a bit quicker than lm or lm.fit.
> >>
> >> Regards,
> >> Kenn
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Python and R

Reply via email to