>>>>> Victor Tian <tianx...@gmail.com> >>>>> on Thu, 3 Aug 2017 09:49:57 -0400 writes:
> To whom it may concern, > I happened to run the following R code just to check the layout of the > output, but found that the code doesn't work the way I thought it should > work. yes, your expectations were wrong. >> lm(rnorm(100) ~ rnorm(100)) > Call: > lm(formula = rnorm(100) ~ rnorm(100)) > Coefficients: > (Intercept) > -0.07966 > Warning messages: > 1: In model.matrix.default(mt, mf, contrasts) : > the response appeared on the right-hand side and was dropped > 2: In model.matrix.default(mt, mf, contrasts) : > problem with term 1 in model.matrix: no columns are assigned > It appears that rnorm(100) produces the same array of numbers on both sides > of the ~ sign. Indeed. And all this has nothing to do with lm() but rather with how formulas in R have been treated probably "forever". [I assume not only in R, but rather since the time formulas where introduced into the S language (for "S version 3") a few years before R was born. But I can no longer verify or disprove this assumption.] Even more revealing may be this: > f <- rnorm(9) ~ rnorm(9) > str(f) Class 'formula' language rnorm(9) ~ rnorm(9) ..- attr(*, ".Environment")=<environment: R_GlobalEnv> > (mm <- model.matrix(f)) (Intercept) 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 attr(,"assign") [1] 0 Warning messages: 1: In model.matrix.default(f) : the response appeared on the right-hand side and was dropped 2: In model.matrix.default(f) : problem with term 1 in model.matrix: no columns are assigned > --------- BTW: One of the goals of formulas, notably in R since they got an environment attached, is a clean way to deal with non-standard evaluation (=: NSE). [ Some of us would claim it is the only clean way to deal with NSE in R, and all new functionality using NSE should use formulas, but recently tidyverse-scholars have claimed to be able to deal with it cleanly w/o the use of formulas, but via "tidy evaluation" ] Using random expressions in a formula is therefore typically not a good idea, because you don't realy know when the terms in the formula will be evaluated. For lm() and all other good formula-based statistical modeling functions, the evaluation happens via model.matrix(). As you've noticed from that warning, model.matrix() tries to help the user by checking terms and eliminating those that appear on both sides of the '~'. This has been documented on the help page [ ?model.matrix ] for (almost exactly 14) years, the "Details:" section ending with _> By convention, if the response variable also appears on the _> right-hand side of the formula it is dropped (with a warning), _> although interactions involving the term are retained. I hope this explains the issue. And yes: Do *not* use rnorm() in formulas. Martin -- Martin Mächler Seminar für Statistik, ETH Zürich // R Core Team ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel