In modelling functions some people like to use a weight of 0 to drop an observation instead of using a subset value of FALSE. E.g., weights=c(0,1,1,...) instead of subset=c(FALSE, TRUE, TRUE, ...) to drop the first observation.
lm() and summary.lm() appear to treat these in the same way, decrementing the number of degrees of freedom for each dropped observation. However, predict.lm() does not treat them the same. It doesn't seem to diminish the df to account for the 0-weighted observations. E.g., the last printout from the following script is as follows, where predw is the prediction from the fit that used 0-weights and preds is from using FALSE's in the subset argument. Is this difference proper? predw preds $fit $fit fit lwr upr fit lwr upr 1 1.544302 1.389254 1.699350 1 1.544302 1.194879 1.893724 2 1.935504 1.719482 2.151526 2 1.935504 1.448667 2.422341 $se.fit $se.fit 1 2 1 2 0.06723657 0.09367810 0.1097969 0.1529757 $df $df [1] 8 [1] 3 $residual.scale $residual.scale [1] 0.1031362 [1] 0.1684207 ### start of script ### data <- data.frame(x=1:10, y=log(1:10)) fitw <- lm(data=data, y~x, weights=as.numeric(x<=5)) fits <- lm(data=data, y~x, subset=x<=5) fitw$df.residual == 3 && fits$df.residual == 3 # TRUE identical(coef(fitw), coef(fits)) # TRUE sumw <- summary(fitw) sums <- summary(fits) identical(sumw$df, sums$df) # TRUE predw <- predict(fitw, interval="confidence", se.fit=TRUE, newdata=data.frame(x=c(4.5,5.5))) preds <- predict(fits, interval="confidence", se.fit=TRUE, newdata=data.frame(x=c(4.5,5.5))) all.equal(predw, preds) # lots of differences sideBySide <- function (a, b, argNames) { # print objects a and b side by side oldWidth <- options(width = getOption("width")/2 - 4) on.exit(options(oldWidth)) if (missing(argNames)) { argNames <- c(deparse(substitute(a))[1], deparse(substitute(b))[1]) } pa <- capture.output(print(a)) pb <- capture.output(print(b)) nlines <- max(length(pa), length(pb)) length(pa) <- nlines length(pb) <- nlines pb[is.na(pb)] <- "" pa[is.na(pa)] <- "" retval <- cbind(pa, pb, deparse.level = 0) dimnames(retval) <- list(rep("",nrow(retval)), argNames) noquote(retval) } # lwr, upr, se.fit, df, residual.scale differ sideBySide(predw, preds) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel