I would have thought that: > lm( C1 ~ M^2, data=DF )
Would give the main effects and 2 way interaction(s) (but a quick test did not match my expectation). Possibly a feature request is in order if people plan to use this a lot. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Ted Harding > Sent: Sunday, May 11, 2008 2:07 PM > To: Myers, Brent > Cc: r-help@r-project.org > Subject: Re: [R] Fundamental formula and dataframe question. > > On 11-May-08 18:58:45, Myers, Brent wrote: > > There is a very useful and apparently fundamental feature > of R (or of > > the package pls) which I don't understand. > > > > For datasets with many independent (X) variables such as > chemometric > > datasets there is a convenient formula and dataframe > construction that > > allows one to access the entire X matrix with a single term. > > > > Consider the gasoline dataset available in the pls package. For the > > model statement in the plsr function one can write: Octane ~ NIR > > > > NIR refers to a (wide) matrix which is a portion of a > dataframe. The > > naming of the columns is of the form: 'NIR.xxxx nm' > > > > names(gasoline) returns... > > > > $names > > [1] "octane" "NIR" > > > > instead of... > > > > $names > > [1] "octane" "NIR.1000 nm" "NIR.1001 nm" ... > > > > How do I construct and manipulate such dataframes and the > column names > > that go with? > > > > Does the use of these types of formulas and dataframes > generalize to > > other modeling functions? > > > > Some specific clues on a help search might be enough, I've > tried many. > > > > Regards, > > Brent > > I don't have the 'gasoline' dataset to hand, but I can > produce something to which your descrption applies as follows: > > C1 <- c(1.1,1.2,1.3,1.4) > C2 <- c(2.1,2.2,2.3,2.4) > M <- cbind(M1=c(11.1,11.2,11.3,11.4), > M2=c(12.1,12.2,12.3,12.4)) > DF <- data.frame(C1=C1,C2=C2,M=M) > DF > # C1 C2 M.M1 M.M2 > # 1 1.1 2.1 11.1 12.1 > # 2 1.2 2.2 11.2 12.2 > # 3 1.3 2.3 11.3 12.3 > # 4 1.4 2.4 11.4 12.4 > > so the two columns C1 and C2 have gone in as named, and the > matrix M (with named columns M1 and M2) has gone in with > columns M.M1, M.M2 > > Now let's fuzz the numbers a bit, so that the lm() fit makes sense: > > C1 <- C1 + round(0.1*runif(4),2) > C1 <- C1 + round(0.1*runif(4),2) > M <- cbind(M1=c(11.1,11.2,11.3,11.4), > M2=c(12.1,12.2,12.3,12.4)) + > round(0.1*runif(8),2) > DF <- data.frame(C1=C1,C2=C2,M=M) > DF > # C1 C2 M.M1 M.M2 > # 1 1.21 2.1 11.19 12.13 > # 2 1.34 2.2 11.23 12.23 > # 3 1.38 2.3 11.36 12.30 > # 4 1.50 2.4 11.43 12.48 > > summary(lm(C1 ~ M),data=DF) > # Call: > # lm(formula = C1 ~ M) > # Residuals: > # 1 2 3 4 > # -0.02422 0.02448 0.01309 -0.01335 > # Coefficients: > # Estimate Std. Error t value Pr(>|t|) > # (Intercept) -8.28435 2.48952 -3.328 0.186 > # MM1 -0.05411 0.66909 -0.081 0.949 > # MM2 0.83463 0.50687 1.647 0.347 > # Residual standard error: 0.03919 on 1 degrees of freedom > # Multiple R-Squared: 0.9642, Adjusted R-squared: 0.8925 > # F-statistic: 13.46 on 2 and 1 DF, p-value: 0.1893 > > In other words, a perfectly standard LM fit, equivalent to > > summary(lm(C1 ~ M[,1]+M[,2])) > > (as you can check). So all that looks straightforward. > > One thing, however, is not clear to me in this scenario. > Suppose, for example, that the columns M1 and M2 of M were > factors (and that you had more rows than I've used above, so > that the fit is non-trivial). > > Then, in the standard specification of an LM, you could write > > summary(lm(C1 ~ M[,1]*M[,2])) > > and get the main effects and interactions. But how would you > do that in the other type of specification: > > Where you used > summary(lm(C1 ~ M, data=DF)) > to get the equivalent of > summary(lm(C1 ~ M[,1]+M[,2])) > what would you use to get the equivalent of > summary(lm(C1 ~ M[,1]*M[,2]))?? > > Would you have to "spell out" the interaction term[s] in > additional columns of M? > > Hmmm, interesting! I hadn't been aware of this aspect of > formula and dataframe construction for modellinng, until you > pointed it out! > > Best wishes, > Ted. > > -------------------------------------------------------------------- > E-Mail: (Ted Harding) <[EMAIL PROTECTED]> > Fax-to-email: +44 (0)870 094 0861 > Date: 11-May-08 Time: 21:06:49 > ------------------------------ XFMail ------------------------------ > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.