Re: [Rd] model.frame(), model.matrix(), and derived predictor variables

Gabriel Becker Wed, 28 Aug 2013 14:48:50 -0700

Ben,

It works for me ...
> x = rpois(100, 5) + 1
> y = rnorm(100, x)
> d = data.frame(x,y)
> m <- lm(y~log(x),d)
> update(m,data=model.frame(m))


Call:
lm(formula = y ~ log(x), data = model.frame(m))

Coefficients:
(Intercept)       log(x)
     -4.010        5.817



You can also re-fit using the model.matrix directly. In your example, the
model frame can be passed directly to lm.fit /lm.wfit.


~G

> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.0.1




On Sat, Aug 24, 2013 at 7:40 PM, Ben Bolker <bbol...@gmail.com> wrote:

>
>   Bump: just trying one more time to see if anyone had thoughts on this
> (so far it's just <crickets> ...)
>
>
> -------- Original Message --------
> Subject: model.frame(), model.matrix(), and derived predictor variables
> Date: Sat, 17 Aug 2013 12:19:58 -0400
> From: Ben Bolker <bbol...@gmail.com>
> To: r-de...@stat.math.ethz.ch <r-de...@stat.math.ethz.ch>
>
>
>   Dear r-developers:
>
>   I am struggling with some fundamental aspects of model.frame().
>
>   Conceptually, I think of a flow from data -> model.frame() ->
> model.matrix; the data contain _input variables_, while model.matrix
> contains _predictor variables_: data have been transformed, splines and
> polynomials have been expanded into their corresponding
> multi-dimensional bases, and factors have been expanded into appropriate
> sets of dummy variables depending on their contrasts.
>   I originally thought of model.frame() as containing input variables as
> well (but with only the variables needed by the model, and with cases
> containing NAs handled according to the relevant na.action setting), but
> that's not quite true.  While factors are retained as-is, splines and
> polynomials and parameter transformations are evaluated. For example
>
> d <- data.frame(x=1:10,y=1:10)
> model.frame(y~log(x),d)
>
> produces a model frame with columns 'y', 'log(x)' (not 'y', 'x').
>
> This makes it hard (impossible?) to use the model frame to re-evaluate
> the existing formula in a model, e.g.
>
> m <- lm(y~log(x),d)
> update(m,data=model.frame(m))
> ## Error in eval(expr, envir, enclos) : object 'x' not found
>
> It seems to me that this is a reasonable thing to want to do
> (i.e. use the model frame as a stored copy of the data that
>  can be used for additional model operations); otherwise, I
> either need to carry along an additional copy of the data in
> a slot, or hope that the model is still living in an environment
> where it can find a copy of the original data.
>
> Does anyone have any insights into the original design choices,
> or suggestions about how they have handled this within their own
> code? Do you just add an additional data slot to the model?  I've
> considered trying to write some kind of 'augmented' model frame, that
> would contain the equivalent of
> setdiff(all.vars(formula),model.frame(m)) [i.e.  all input variables
> that appeared in the formula but not in the model frame ...].
>
>   thanks
>    Ben Bolker
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] model.frame(), model.matrix(), and derived predictor variables

Reply via email to