Thanks very much Dennis and Michael. That makes sense. I had some very sloppy code which did the equivalent of
y<-rnorm(100) x<-list() x[[1]]<-rnorm(100) x[[2]]<-rnorm(100) lm.yx<-lm(y~x[[1]]+x[[2]]) To predict from lm.yx, I was trying to feed predict.lm a data.frame with columns called x[[1]] and x[[2]]. Of course, in retrospect, that makes no sense. Following on Michael's suggestion, I did predict(lm.yx, newdata=list(x=list(rnorm(100), rnorm(100)))) which works as expected. I agree, messing with the lm object is definitely a bad idea. I was just having a really hard time feeding predict.lm with the correct objects and getting desperate. Thanks again! Best, Anirban -- Anirban Mukherjee | Assistant Professor, Marketing LKCSB, Singapore Management University 5056 School of Business, 50 Stamford Road Singapore 178899 | +65-6828-1932 On Thu, Dec 16, 2010 at 3:22 PM, Dennis Murphy <djmu...@gmail.com> wrote: > Hi: > > On Wed, Dec 15, 2010 at 10:05 PM, Anirban Mukherjee <am...@cornell.edu> > wrote: >> >> Hi all, >> >> Suppose: >> >> y<-rnorm(100) >> x1<-rnorm(100) >> lm.yx<-lm(y~x1) >> >> To predict from a new data source, one can use: >> >> # works as expected >> dum<-data.frame(x1=rnorm(200)) >> predict(lm.yx, newdata=dum) > > Yup. >> >> Suppose lm.yx has been run and we have the lm object. And we have a >> dataframe that has columns that don't correspond by name to the >> original regressors. I very! naively assumed that doing this (below) >> would work. It does not. >> >> # does not work >> lm.yx$coefficients<-c("Intercept", "n.x1") >> dum2<-data.frame(Int=rep(1,200), n.x1=rnorm(200)) >> predict(lm.yx, newdata=dum2) >> > No, it does not. Compare the names of lm.yx$coefficients with those of > model.matrix(lm.yx). If they're different, you have a problem. Here's a toy > example to illustrate it: > > x1 <- seq(from = 10, to = 200, by = 10) > x2 <- rpois(20, 15) > y <- 2 + 0.7 * x1 + 0.5 * x2 + rnorm(20, 0, 0.4) > m <- lm(y ~ x1 + x2) > u <- data.frame(x1 = c(175, 210), x2 = c(20, 25)) > predict(m, newdata = u) > 1 2 > 134.7912 161.7939 > coef(m) > (Intercept) x1 x2 > 2.1925618 0.7025116 0.4829565 > > # No problem thus far. Let's change the names on the prediction data frame: > > names(u) <- c('x3', 'x4') > predict(m, newdata = u) > 1 2 3 4 5 6 > 7 8 > 15.49611 21.55532 31.47817 37.53737 44.07953 51.58761 58.61272 > 66.60375 > 9 10 11 12 13 14 15 > 16 > 70.73113 79.20511 86.23023 92.28943 99.31455 108.27149 115.29661 > 117.49216 > 17 18 19 20 > 130.31275 136.85491 142.91412 152.83697 > Warning message: > 'newdata' had 2 rows but variable(s) found have 20 rows > > # Change the names on the coefficients > names(m$coefficients)[c(2, 3)] <- c('x3', 'x4') > coef(m) > (Intercept) x3 x4 > 2.1925618 0.7025116 0.4829565 > predict(m, newdata = u) # same as above > > # \hat{y} = X \hat{\beta}, so let's look at the model matrix: > names(model.matrix(m)) > NULL # Oops...it's a matrix! > colnames(model.matrix(m)) > [1] "(Intercept)" "x1" "x2" > > # ...this is after you redefined the names of the model coefficients. > # Now try to fix the names of the model matrix... > colnames(model.matrix(m))[c(2, 3)] <- c('x3', 'x4') > Error in colnames(model.matrix(m))[c(2, 3)] <- c("x3", "x4") : > could not find function "model.matrix<-" > > There's a way to do this with replacement functions at a > fundamental level, but it takes some work. Even if you get that, > there will still be potential issues - e.g., if you want to compute > confidence/prediction intervals. > > lm() returns a list object with these names: > names(m) > [1] "coefficients" "residuals" "effects" "rank" > [5] "fitted.values" "assign" "qr" "df.residual" > [9] "xlevels" "call" "terms" "model" > > Now look at the names of the following: > > m$effects > m$xlevels > m$call > m$terms > m$model > m$qr > > Do you really want to go through the model object and change all of that for > a few variable names? > >> I know that a simple alternative is to do: >> >> # because we messed around with the lm object above, re-building >> lm.yx<-lm(y~x1) >> >> # change names of dum2 to match names of coefficients of lm.yx >> names(dum2)<-names(coefficients(lm.yx)) >> predict(lm.yx, newdata=dum2) >> >> Is there another way that involves changing the lm object rather than >> changing the prediction data.frame? > > It seems to me that changing the lm object post hoc is wasted energy. You > won't have any problem if you coordinate the names of the prediction data > frame with those of the input data, which should ideally be a data frame > itself. The most hassle-free strategy is to input a data frame into lm() > that contains the variables you want to model, and generate a set of > prediction points with a data frame whose columns have the same names as > those on the RHS of the model formula. > > HTH, > Dennis >> >> Thanks, >> Anirban >> >> -- >> Anirban Mukherjee | Assistant Professor, Marketing >> LKCSB, Singapore Management University >> 5056 School of Business, 50 Stamford Road >> Singapore 178899 | +65-6828-1932 >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.