On Thu, 29 Jul 2010, Peter Dalgaard wrote:

Peter Dalgaard wrote:
William Dunlap wrote:
In modelling functions some people like to use
a weight of 0 to drop an observation instead of
using a subset value of FALSE.  E.g.,
  weights=c(0,1,1,...)
instead of
  subset=c(FALSE, TRUE, TRUE, ...)
to drop the first observation.

lm() and summary.lm() appear to treat these in the
same way, decrementing the number of degrees of
freedom for each dropped observation.  However,
predict.lm() does not treat them the same.  It
doesn't seem to diminish the df to account for the
0-weighted observations.   E.g., the last printout
from the following script is as follows, where
predw is the prediction from the fit that used
0-weights and preds is from using FALSE's in the
subset argument.  Is this difference proper?

Nice catch.

The issue is that the subset fit and the zero-weighted fit are not
completely the same. Notice that the residuals vector has different
length in the two analyses. With a simplified setup:

length(lm(y~1,weights=w)$residuals)
[1] 10
length(lm(y~1,subset=-1)$residuals)
[1] 9
w
 [1] 0 1 1 1 1 1 1 1 1 1

This in turn is what confuses predict.lm because it gets n and residual
df from length(object$residuals). summary.lm() uses NROW(Qr$qr), and I
suppose that predict.lm should follow suit.


...and then when I went to fix it, I found that the actual line in the
sources (stats/R/lm.R) reads

27442     ripley     n <- length(object$residuals) # NROW(object$qr$qr)

so it's been like that since December 2003. I wonder if Brian remembers
what the point was? (27442 was the restructuring into the stats package,
so it might not actually be Brian's code).

At least that wasn't the point of change: the code was the same in R 1.8.1 (pre-split).

I think you will find that 'n' is used in several ways in predict.lm, and since NA-handling was introduced in R 1.8.0 they may differ in value. So the safest route seems to be to change just 'n' in

                df <- n - p


--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to