Re: [R] Newbie help with ANOVA and lm.

Peter Ehlers Sat, 27 Feb 2010 08:57:15 -0800

On 2010-02-27 8:53, rkevinbur...@charter.net wrote:

Would someone be so kind as to explain in English what the ANOVA code 
(anova.lm) is doing? I am having a hard time reconciling what the text books 
have as a brute force regression and the formula algorithm in 'R'. Specifically 
I see:


     p<- object$rank
     if (p>  0L) {
         p1<- 1L:p
         comp<- object$effects[p1]
         asgn<- object$assign[object$qr$pivot][p1]
         nmeffects<- c("(Intercept)", attr(object$terms, "term.labels"))
         tlabels<- nmeffects[1 + unique(asgn)]
         ss<- c(unlist(lapply(split(comp^2, asgn), sum)), ssr)
         df<- c(unlist(lapply(split(asgn, asgn), length)), dfr)
     }
     else {
         ss<- ssr
         df<- dfr
         tlabels<- character(0L)
     }
     ms<- ss/df
     f<- ms/(ssr/dfr)
     P<- pf(f, df, dfr, lower.tail = FALSE)


I think I understand the check for 'p' being non-zero. 'p' is essentially the 
number of terms in the model matrix (including the intercept term if it 
exists). So in a mathematical description of a regression that included the 
intercept and one term (like dist ~ speed) you would have a model matrix of a 
column of '1's and then a column of data. The 'assign' would be a vector 
containing [0,1]. So then in finding the degrees of freedom you split the 
asssign matrix with itself. I am having a hard time seeing that this ever 
produces degrees of freedom that are different. So I get that the vector 'df' 
would always be something like [2,2,dfr]. But that is obviously wrong. Would 
someone care to elighten me on what the code above is doing?


split(asgn, asgn) splits the vector (not matrix) 'asgn' into
list components. Then lapply() applies length() to each list
component which gives the associated degrees of freedom.
unlist() removes the list structure, producing a vector of dfs.
For simple regression, this results in c(1,1). The residual
dfs are then tacked on to give the df-vector df=c(1,1,dfr).
For models with an intercept the first component of df should
always be 1. But this is discarded in the output matrix.

With two numerical predictors: y ~ x1 + x2,
you should find that asgn = c(0,1,2) leading to df = c(1,1,1,dfr).

  -Peter Ehlers

Thank you.

Kevin

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Peter Ehlers
University of Calgary

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Newbie help with ANOVA and lm.

Reply via email to