[Rd] problem in add1's F statistic when data contains NAs?

William Dunlap Tue, 14 May 2013 12:25:34 -0700

Shouldn't the F statistic (and p value) for the x2 term in the following calls
to anova() and add1() be the same?  I think anova() gets it right and add1()
does not.


> d <- data.frame(y=1:10, x1=log(1:10), x2=replace(1/(1:10), 2:3, NA))
> anova(lm(y ~ x1 + x2, data=d))
Analysis of Variance Table

Response: y
          Df    Sum Sq   Mean Sq    F value     Pr(>F)    
x1         1 52.905613 52.905613 1108.61455 4.5937e-07 ***
x2         1  6.355775  6.355775  133.18256 8.5678e-05 ***
Residuals  5  0.238611  0.047722                          
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
> add1(lm(y ~ x1, data=d), y ~ x1 + x2, test="F")
Single term additions

Model:
y ~ x1
       Df Sum of Sq       RSS         AIC   F value     Pr(>F)    
<none>              6.5943869   2.4542182                         
x2      1 6.3557755 0.2386114 -22.0988844 186.45559 2.6604e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
Warning message:
In add1.lm(lm(y ~ x1, data = d), y ~ x1 + x2, test = "F") :
  using the 8/10 rows from a combined fit

It looks like add1 is using 7 instead of 5 for the denominator degrees of 
freedom,
7 being the value in the original fit, before the 2 rows containing NA's in x2
were omitted.

> (6.355775/1) / (0.238611/5)
[1] 133.1827745
> (6.355775/1) / (0.238611/7)
[1] 186.4558843

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] problem in add1's F statistic when data contains NAs?

Reply via email to