Dear Paul, Isn't this really a question for r-help rather than r-package-devel?
In any event, I think that your message is based on (an entirely understandable) misunderstanding of what the : operator on the right-hand side of a model formula does. The : operator is equivalent to %in%, and used as you have will produce a nested model structure. If you insure that all lower-order terms (i.e., all terms marginal to an interaction) are present, : may be used to create interactions, but as a general matter formulas do not like to create model matrices that violate marginality. Consider the following simplified (equivalent) examples, using your "data": ---------- snip -------------- > cbind(dat[, 2:3], stats::model.matrix(y ~ x1 + x1:x2, dat)) x1 x2 (Intercept) x11 x10:x21 x11:x21 1 0 0 1 0 0 0 2 0 0 1 0 0 0 3 0 1 1 0 1 0 4 0 1 1 0 1 0 5 1 0 1 1 0 0 6 1 0 1 1 0 0 7 1 1 1 1 0 1 8 1 1 1 1 0 1 > > cbind(dat[, 2:3], stats::model.matrix(y ~ x1 + x2 %in% x1, dat)) x1 x2 (Intercept) x11 x10:x21 x11:x21 1 0 0 1 0 0 0 2 0 0 1 0 0 0 3 0 1 1 0 1 0 4 0 1 1 0 1 0 5 1 0 1 1 0 0 6 1 0 1 1 0 0 7 1 1 1 1 0 1 8 1 1 1 1 0 1 > > cbind(dat[, 2:3], stats::model.matrix(y ~ x1/x2, dat)) x1 x2 (Intercept) x11 x10:x21 x11:x21 1 0 0 1 0 0 0 2 0 0 1 0 0 0 3 0 1 1 0 1 0 4 0 1 1 0 1 0 5 1 0 1 1 0 0 6 1 0 1 1 0 0 7 1 1 1 1 0 1 8 1 1 1 1 0 1 ---------- snip -------------- In the first two cases, reversing the order of the operands, x2:x1 and x2 %in% x1, would also produce the same results. I hope this helps, John ----------------------------- John Fox, Professor McMaster University Hamilton, Ontario Canada L8S 4M4 Web: socserv.mcmaster.ca/jfox > -----Original Message----- > From: R-package-devel [mailto:r-package-devel-boun...@r-project.org] On > Behalf Of Paul Buerkner > Sent: March 30, 2017 6:54 AM > To: r-package-devel@r-project.org > Subject: [R-pkg-devel] Problem in stats::model.matrix when omitting two- > way interactions > > Hi all, > > recently I stumbled upen a problem in stats::model.matrix that I think is > worth reporting. > > When I run: > > > dat <- data.frame( > > y = rnorm(8), > > x1 = factor(rep(0:1, each = 4)), > > x2 = factor(rep(rep(0:1, each = 2), 2)), > > x3 = factor(rep(0:1, 4)) > > ) > > > > stats::model.matrix(y ~ x1+x2+x3 + x1:x2:x3, dat) > > I get a matrix with 12 columns, which are linearily dependent and thus not > identified in a linear model: > > > summary(lm(y ~ x1+x2+x3 + x1:x2:x3, dat)) > > Of course, there is usually no need for such a formula that ignores the two- > way interactions, but from my point of view, model.matrix should still return > only 8 columns (or less) in order to produce identified models. > > I wonder if this is some sort of intendend behavior or just a side effect of > the > way model.matrix handles factors. > > Many thanks in advance. > > Paul > > [[alternative HTML version deleted]] > > ______________________________________________ > R-package-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-package-devel ______________________________________________ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel