On Jul 8, 2009, at 8:51 AM, Whit Armstrong wrote:

I'm running a huge number of regressions in a loop, so I tried lm.fit
for a speedup.  However, I would like to be able to calculate the
t-stats for the coefficients.

Does anyone have some functions for calculating the regression summary
stats of an lm.fit object?

Thanks,
Whit



Whit, depending upon just how much time savings you are realizing by using lm.fit() and not lm(), the approach to your question may vary.

Do you need all of the models, or only a subset?

If the latter, then I would narrow down your model set and re-run them with lm() so that you can use summary.lm() directly. That would entail less custom coding, which may otherwise offset any time savings from using lm.fit()

If the former, then there are two choices as I see them.

The first would be to restructure the object resulting from lm.fit() by adding the elements required to run summary.lm(). However, I would think that this overhead would bring you back to a point where just using lm() would be a better approach from a time standpoint.

The second would be to cook up a function that only provides the subset of results that you need from summary.lm() and then use that on the results of lm.fit(). Here again, there remains the question of just how much time are you saving using lm.fit() versus the additional overhead of calculating even a subset of the output.

Here is a very simple approach to a function that would get you a subset of the output that you would get using, for example, coef(summary(lm.object)). This is using a selective approach of copying and slightly editing code from summary.lm(). Note that there is other code in summary.lm() to handle weights and such, if your models are more complex. You would need to add that in if that is the case.

If you need much more summary output than this on each model, then I think you would be better off just using lm() and summary.lm().


# Use at your own risk...untested on more complex models  :-)

# 'x' is an lm.fit object

calc.lm.t <- function(x)
{
  Qr <- x$qr
  r <- x$residuals
  p <- x$rank
  p1 <- 1L:p
  rss <- sum(r^2)

  n <- NROW(Qr$qr)
  rdf <- n - p

  resvar <- rss/rdf
  R <- chol2inv(Qr$qr[p1, p1, drop = FALSE])
  se <- sqrt(diag(R) * resvar)

  est <- x$coefficients[Qr$pivot[p1]]
  tval <- est/se

  res <- cbind(est = est, se = se, tval = tval)
  res
}



Here is some simple example data:

set.seed(1)
y <- rnorm(100)
x <- rnorm(100)


# Get the default coefficient output using summary.lm()
> coef(summary(lm(y ~ x)))
                 Estimate Std. Error     t value  Pr(>|t|)
(Intercept)  0.1088521158 0.09034800  1.20480938 0.2311784
x           -0.0009323697 0.09472155 -0.00984327 0.9921663



# Now use calc.lm.t

lmf <- lm.fit(model.matrix(y ~ x), y)

> calc.lm.t(lmf)
                      est         se        tval
(Intercept)  0.1088521158 0.09034800  1.20480938
x           -0.0009323697 0.09472155 -0.00984327



I'll leave it to you to see whether this approach may or may not be helpful from a time perspective.

HTH,

Marc Schwartz

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to