Marc, Thanks very much for your detailed reply. I'll give your code a try and post back the time difference.
Cheers, Whit On Wed, Jul 8, 2009 at 10:50 AM, Marc Schwartz<marc_schwa...@me.com> wrote: > On Jul 8, 2009, at 8:51 AM, Whit Armstrong wrote: > >> I'm running a huge number of regressions in a loop, so I tried lm.fit >> for a speedup. However, I would like to be able to calculate the >> t-stats for the coefficients. >> >> Does anyone have some functions for calculating the regression summary >> stats of an lm.fit object? >> >> Thanks, >> Whit > > > > Whit, depending upon just how much time savings you are realizing by using > lm.fit() and not lm(), the approach to your question may vary. > > Do you need all of the models, or only a subset? > > If the latter, then I would narrow down your model set and re-run them with > lm() so that you can use summary.lm() directly. That would entail less > custom coding, which may otherwise offset any time savings from using > lm.fit() > > If the former, then there are two choices as I see them. > > The first would be to restructure the object resulting from lm.fit() by > adding the elements required to run summary.lm(). However, I would think > that this overhead would bring you back to a point where just using lm() > would be a better approach from a time standpoint. > > The second would be to cook up a function that only provides the subset of > results that you need from summary.lm() and then use that on the results of > lm.fit(). Here again, there remains the question of just how much time are > you saving using lm.fit() versus the additional overhead of calculating even > a subset of the output. > > Here is a very simple approach to a function that would get you a subset of > the output that you would get using, for example, coef(summary(lm.object)). > This is using a selective approach of copying and slightly editing code from > summary.lm(). Note that there is other code in summary.lm() to handle > weights and such, if your models are more complex. You would need to add > that in if that is the case. > > If you need much more summary output than this on each model, then I think > you would be better off just using lm() and summary.lm(). > > > # Use at your own risk...untested on more complex models :-) > > # 'x' is an lm.fit object > > calc.lm.t <- function(x) > { > Qr <- x$qr > r <- x$residuals > p <- x$rank > p1 <- 1L:p > rss <- sum(r^2) > > n <- NROW(Qr$qr) > rdf <- n - p > > resvar <- rss/rdf > R <- chol2inv(Qr$qr[p1, p1, drop = FALSE]) > se <- sqrt(diag(R) * resvar) > > est <- x$coefficients[Qr$pivot[p1]] > tval <- est/se > > res <- cbind(est = est, se = se, tval = tval) > res > } > > > > Here is some simple example data: > > set.seed(1) > y <- rnorm(100) > x <- rnorm(100) > > > # Get the default coefficient output using summary.lm() >> coef(summary(lm(y ~ x))) > Estimate Std. Error t value Pr(>|t|) > (Intercept) 0.1088521158 0.09034800 1.20480938 0.2311784 > x -0.0009323697 0.09472155 -0.00984327 0.9921663 > > > > # Now use calc.lm.t > > lmf <- lm.fit(model.matrix(y ~ x), y) > >> calc.lm.t(lmf) > est se tval > (Intercept) 0.1088521158 0.09034800 1.20480938 > x -0.0009323697 0.09472155 -0.00984327 > > > > I'll leave it to you to see whether this approach may or may not be helpful > from a time perspective. > > HTH, > > Marc Schwartz > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.