Re: [R] repost: problems with lm for nested fixed-factor Anova (ANOVA I)

Richard M. Heiberger Thu, 12 Feb 2009 09:18:54 -0800

tmp <- data.frame(y=rnorm(15000),
                  x1 <- factor(sample(48, 15000, replace=TRUE)),
                  z1 <- factor(sample(242, 15000, replace=TRUE)))
system.time(
            tmp.aov <- aov(y ~ x1/z1, data=tmp)
            )
## exceeds memory


tmp2 <- data.frame(y=rnorm(15000),
                   x1 <- factor(sample(48, 15000, replace=TRUE)),
                   z1 <- factor(sample(5, 15000, replace=TRUE)))
system.time(
            tmp2.aov <- aov(y ~ x1/z1, data=tmp2)
            )
anova(tmp2.aov)
## about 5 seconds

Use data.frames.  They make it easier to read.
Use aov() instead of lm().  It is the same arithmetic,
but the unneeded columns of X are handled more gracefully.

My guess is that your data has 100s of distinct values for z1.
Therefore excess space was allocated.  It is easier to understand with
distinct values of z1, but as you see it is costly in computer
resources.

You can force the actual numerical values of the second term to be
distinct across levels of x1 with the interaction() function.  Then
use the simpler model and let the linear dependencies work in your
favor.

system.time(
            tmp.aov <- aov(y ~ x1 + interaction(x1, z1), data=tmp)
)
anova(tmp.aov)
## about 6 seconds



Rich

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] repost: problems with lm for nested fixed-factor Anova (ANOVA I)

Reply via email to