Hi:

On Wed, Dec 15, 2010 at 10:05 PM, Anirban Mukherjee <am...@cornell.edu>wrote:

> Hi all,
>
> Suppose:
>
> y<-rnorm(100)
> x1<-rnorm(100)
> lm.yx<-lm(y~x1)
>
> To predict from a new data source, one can use:
>
> # works as expected
> dum<-data.frame(x1=rnorm(200))
> predict(lm.yx, newdata=dum)
>

Yup.

>
> Suppose lm.yx has been run and we have the lm object. And we have a
> dataframe that has columns that don't correspond by name to the
> original regressors. I very! naively assumed that doing this (below)
> would work. It does not.
>
> # does not work
> lm.yx$coefficients<-c("Intercept", "n.x1")
> dum2<-data.frame(Int=rep(1,200), n.x1=rnorm(200))
> predict(lm.yx, newdata=dum2)
>
> No, it does not. Compare the names of lm.yx$coefficients with those of
model.matrix(lm.yx). If they're different, you have a problem. Here's a toy
example to illustrate it:

x1 <- seq(from = 10, to = 200, by = 10)
x2 <- rpois(20, 15)
y <- 2 + 0.7 * x1 + 0.5 * x2 + rnorm(20, 0, 0.4)
m <- lm(y ~ x1 + x2)
u <- data.frame(x1 = c(175, 210), x2 = c(20, 25))
predict(m, newdata = u)
       1        2
134.7912 161.7939
coef(m)
(Intercept)          x1          x2
  2.1925618   0.7025116   0.4829565

# No problem thus far. Let's change the names on the prediction data frame:

names(u) <- c('x3', 'x4')
predict(m, newdata = u)
        1         2         3         4         5         6
7         8
 15.49611  21.55532  31.47817  37.53737  44.07953  51.58761  58.61272
66.60375
        9        10        11        12        13        14        15
16
 70.73113  79.20511  86.23023  92.28943  99.31455 108.27149 115.29661
117.49216
       17        18        19        20
130.31275 136.85491 142.91412 152.83697
Warning message:
'newdata' had 2 rows but variable(s) found have 20 rows

# Change the names on the coefficients
names(m$coefficients)[c(2, 3)] <- c('x3', 'x4')
coef(m)
(Intercept)          x3          x4
  2.1925618   0.7025116   0.4829565
predict(m, newdata = u)       # same as above

# \hat{y} = X \hat{\beta}, so let's look at the model matrix:
names(model.matrix(m))
NULL                                         # Oops...it's a matrix!
colnames(model.matrix(m))
[1] "(Intercept)" "x1"          "x2"

# ...this is after you redefined the names of the model coefficients.
# Now try to fix the names of the model matrix...
colnames(model.matrix(m))[c(2, 3)] <- c('x3', 'x4')
Error in colnames(model.matrix(m))[c(2, 3)] <- c("x3", "x4") :
  could not find function "model.matrix<-"

There's a way to do this with replacement functions at a
fundamental level, but it takes some work. Even if you get that,
there will still be potential issues - e.g., if you want to compute
confidence/prediction intervals.

lm() returns a list object with these names:
names(m)
 [1] "coefficients"  "residuals"     "effects"       "rank"
 [5] "fitted.values" "assign"        "qr"            "df.residual"
 [9] "xlevels"       "call"          "terms"         "model"

Now look at the names of the following:

m$effects
m$xlevels
m$call
m$terms
m$model
m$qr

Do you really want to go through the model object and change all of that for
a few variable names?

I know that a simple alternative is to do:
>
> # because we messed around with the lm object above, re-building
> lm.yx<-lm(y~x1)
>
> # change names of dum2 to match names of coefficients of lm.yx
> names(dum2)<-names(coefficients(lm.yx))
> predict(lm.yx, newdata=dum2)
>
> Is there another way that involves changing the lm object rather than
> changing the prediction data.frame?
>

It seems to me that changing the lm object post hoc is wasted energy. You
won't have any problem if you coordinate the names of the prediction data
frame with those of the input data, which should ideally be a data frame
itself. The most hassle-free strategy is to input a data frame into lm()
that contains the variables you want to model, and generate a set of
prediction points with a data frame whose columns have the same names as
those on the RHS of the model formula.

HTH,
Dennis

>
> Thanks,
> Anirban
>
> --
> Anirban Mukherjee | Assistant Professor, Marketing
> LKCSB, Singapore Management University
> 5056 School of Business, 50 Stamford Road
> Singapore 178899 | +65-6828-1932
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to