Hi: On Wed, Dec 15, 2010 at 10:05 PM, Anirban Mukherjee <am...@cornell.edu>wrote:
> Hi all, > > Suppose: > > y<-rnorm(100) > x1<-rnorm(100) > lm.yx<-lm(y~x1) > > To predict from a new data source, one can use: > > # works as expected > dum<-data.frame(x1=rnorm(200)) > predict(lm.yx, newdata=dum) > Yup. > > Suppose lm.yx has been run and we have the lm object. And we have a > dataframe that has columns that don't correspond by name to the > original regressors. I very! naively assumed that doing this (below) > would work. It does not. > > # does not work > lm.yx$coefficients<-c("Intercept", "n.x1") > dum2<-data.frame(Int=rep(1,200), n.x1=rnorm(200)) > predict(lm.yx, newdata=dum2) > > No, it does not. Compare the names of lm.yx$coefficients with those of model.matrix(lm.yx). If they're different, you have a problem. Here's a toy example to illustrate it: x1 <- seq(from = 10, to = 200, by = 10) x2 <- rpois(20, 15) y <- 2 + 0.7 * x1 + 0.5 * x2 + rnorm(20, 0, 0.4) m <- lm(y ~ x1 + x2) u <- data.frame(x1 = c(175, 210), x2 = c(20, 25)) predict(m, newdata = u) 1 2 134.7912 161.7939 coef(m) (Intercept) x1 x2 2.1925618 0.7025116 0.4829565 # No problem thus far. Let's change the names on the prediction data frame: names(u) <- c('x3', 'x4') predict(m, newdata = u) 1 2 3 4 5 6 7 8 15.49611 21.55532 31.47817 37.53737 44.07953 51.58761 58.61272 66.60375 9 10 11 12 13 14 15 16 70.73113 79.20511 86.23023 92.28943 99.31455 108.27149 115.29661 117.49216 17 18 19 20 130.31275 136.85491 142.91412 152.83697 Warning message: 'newdata' had 2 rows but variable(s) found have 20 rows # Change the names on the coefficients names(m$coefficients)[c(2, 3)] <- c('x3', 'x4') coef(m) (Intercept) x3 x4 2.1925618 0.7025116 0.4829565 predict(m, newdata = u) # same as above # \hat{y} = X \hat{\beta}, so let's look at the model matrix: names(model.matrix(m)) NULL # Oops...it's a matrix! colnames(model.matrix(m)) [1] "(Intercept)" "x1" "x2" # ...this is after you redefined the names of the model coefficients. # Now try to fix the names of the model matrix... colnames(model.matrix(m))[c(2, 3)] <- c('x3', 'x4') Error in colnames(model.matrix(m))[c(2, 3)] <- c("x3", "x4") : could not find function "model.matrix<-" There's a way to do this with replacement functions at a fundamental level, but it takes some work. Even if you get that, there will still be potential issues - e.g., if you want to compute confidence/prediction intervals. lm() returns a list object with these names: names(m) [1] "coefficients" "residuals" "effects" "rank" [5] "fitted.values" "assign" "qr" "df.residual" [9] "xlevels" "call" "terms" "model" Now look at the names of the following: m$effects m$xlevels m$call m$terms m$model m$qr Do you really want to go through the model object and change all of that for a few variable names? I know that a simple alternative is to do: > > # because we messed around with the lm object above, re-building > lm.yx<-lm(y~x1) > > # change names of dum2 to match names of coefficients of lm.yx > names(dum2)<-names(coefficients(lm.yx)) > predict(lm.yx, newdata=dum2) > > Is there another way that involves changing the lm object rather than > changing the prediction data.frame? > It seems to me that changing the lm object post hoc is wasted energy. You won't have any problem if you coordinate the names of the prediction data frame with those of the input data, which should ideally be a data frame itself. The most hassle-free strategy is to input a data frame into lm() that contains the variables you want to model, and generate a set of prediction points with a data frame whose columns have the same names as those on the RHS of the model formula. HTH, Dennis > > Thanks, > Anirban > > -- > Anirban Mukherjee | Assistant Professor, Marketing > LKCSB, Singapore Management University > 5056 School of Business, 50 Stamford Road > Singapore 178899 | +65-6828-1932 > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.