Hello Tyler, model.matrix(~(X1+X2+X3)^3-X1:X3)
T_i = X1:X2:X3. Let F_j = X3. (The numerical variables X1 and X2 are not encoded at all. Then, again, T_{i(j)} = X1:X2, which in this example is NOT dropped from the model. Hence the X3 in T_i must be encoded by contrast, as indeed it is. Arie On Mon, Nov 6, 2017 at 5:09 PM, Tyler <tyle...@gmail.com> wrote: > Hi Arie, > > Given the heuristic, in all of my examples with a missing two-factor > interaction the three-factor interaction should be coded with dummy > variables. In reality, it is encoded by dummy variables only when the > numeric:numeric interaction is missing, and by contrasts for the other two. > The heuristic does not specify separate behavior for numeric vs categorical > factors (When the author of Statistical Models in S refers to F_j as a > "factor", it is a more general usage than the R type "factor" and includes > numeric variables--the language used later on in the chapter on page 40 > confirms this): when there is a missing marginal term in the formula, the > higher-order interaction should be coded by dummy variables, regardless of > type. Thus, the terms() function is only following the cited behavior 1/3rd > of the time. > > Best regards, > Tyler > > On Mon, Nov 6, 2017 at 6:45 AM, Arie ten Cate <arietenc...@gmail.com> wrote: >> >> Hello Tyler, >> >> You write that you understand what I am saying. However, I am now at >> loss about what exactly is the problem with the behavior of R. Here >> is a script which reproduces your experiments with three variables >> (excluding the full model): >> >> m=expand.grid(X1=c(1,-1),X2=c(1,-1),X3=c("A","B","C")) >> model.matrix(~(X1+X2+X3)^3-X1:X3,data=m) >> model.matrix(~(X1+X2+X3)^3-X2:X3,data=m) >> model.matrix(~(X1+X2+X3)^3-X1:X2,data=m) >> >> Below are the three results, similar to your first mail. (The first >> two are basically the same, of course.) Please pick one result which >> you think is not consistent with the heuristic and please give what >> you think is the correct result: >> >> model.matrix(~(X1+X2+X3)^3-X1:X3) >> (Intercept) >> X1 X2 X3B X3C >> X1:X2 X2:X3B X2:X3C >> X1:X2:X3B X1:X2:X3C >> >> model.matrix(~(X1+X2+X3)^3-X2:X3) >> (Intercept) >> X1 X2 X3B X3C >> X1:X2 X1:X3B X1:X3C >> X1:X2:X3B X1:X2:X3C >> >> model.matrix(~(X1+X2+X3)^3-X1:X2) >> (Intercept) >> X1 X2 X3B X3C >> X1:X3B X1:X3C X2:X3B X2:X3C >> X1:X2:X3A X1:X2:X3B X1:X2:X3C >> >> (I take it that the combination of X3A and X3B and X3C implies dummy >> encoding, and the combination of only X3B and X3C implies contrasts >> encoding, with respect to X3A.) >> >> Thanks in advance, >> >> Arie >> >> >> On Sat, Nov 4, 2017 at 5:33 PM, Tyler <tyle...@gmail.com> wrote: >> > Hi Arie, >> > >> > I understand what you're saying. The following excerpt out of the book >> > shows >> > that F_j does not refer exclusively to categorical factors: "...the rule >> > does not do anything special for them, and it remains valid, in a >> > trivial >> > sense, whenever any of the F_j is numeric rather than categorical." >> > Since >> > F_j refers to both categorical and numeric variables, the behavior of >> > model.matrix is not consistent with the heuristic. >> > >> > Best regards, >> > Tyler >> > >> > On Sat, Nov 4, 2017 at 6:50 AM, Arie ten Cate <arietenc...@gmail.com> >> > wrote: >> >> >> >> Hello Tyler, >> >> >> >> I rephrase my previous mail, as follows: >> >> >> >> In your example, T_i = X1:X2:X3. Let F_j = X3. (The numerical >> >> variables X1 and X2 are not encoded at all.) Then T_{i(j)} = X1:X2, >> >> which in the example is dropped from the model. Hence the X3 in T_i >> >> must be encoded by dummy variables, as indeed it is. >> >> >> >> Arie >> >> >> >> >> >> On Thu, Nov 2, 2017 at 4:11 PM, Tyler <tyle...@gmail.com> wrote: >> >> > Hi Arie, >> >> > >> >> > The book out of which this behavior is based does not use factor (in >> >> > this >> >> > section) to refer to categorical factor. I will again point to this >> >> > sentence, from page 40, in the same section and referring to the >> >> > behavior >> >> > under question, that shows F_j is not limited to categorical factors: >> >> > "Numeric variables appear in the computations as themselves, uncoded. >> >> > Therefore, the rule does not do anything special for them, and it >> >> > remains >> >> > valid, in a trivial sense, whenever any of the F_j is numeric rather >> >> > than >> >> > categorical." >> >> > >> >> > Note the "... whenever any of the F_j is numeric rather than >> >> > categorical." >> >> > Factor here is used in the more general sense of the word, not >> >> > referring >> >> > to >> >> > the R type "factor." The behavior of R does not match the heuristic >> >> > that >> >> > it's citing. >> >> > >> >> > Best regards, >> >> > Tyler >> >> > >> >> > On Thu, Nov 2, 2017 at 2:51 AM, Arie ten Cate <arietenc...@gmail.com> >> >> > wrote: >> >> >> >> >> >> Hello Tyler, >> >> >> >> >> >> Thank you for searching for, and finding, the basic description of >> >> >> the >> >> >> behavior of R in this matter. >> >> >> >> >> >> I think your example is in agreement with the book. >> >> >> >> >> >> But let me first note the following. You write: "F_j refers to a >> >> >> factor (variable) in a model and not a categorical factor". However: >> >> >> "a factor is a vector object used to specify a discrete >> >> >> classification" (start of chapter 4 of "An Introduction to R".) You >> >> >> might also see the description of the R function factor(). >> >> >> >> >> >> You note that the book says about a factor F_j: >> >> >> "... F_j is coded by contrasts if T_{i(j)} has appeared in the >> >> >> formula and by dummy variables if it has not" >> >> >> >> >> >> You find: >> >> >> "However, the example I gave demonstrated that this dummy >> >> >> variable >> >> >> encoding only occurs for the model where the missing term is the >> >> >> numeric-numeric interaction, ~(X1+X2+X3)^3-X1:X2." >> >> >> >> >> >> We have here T_i = X1:X2:X3. Also: F_j = X3 (the only factor). Then >> >> >> T_{i(j)} = X1:X2, which is dropped from the model. Hence the X3 in >> >> >> T_i >> >> >> must be encoded by dummy variables, as indeed it is. >> >> >> >> >> >> Arie >> >> >> >> >> >> On Tue, Oct 31, 2017 at 4:01 PM, Tyler <tyle...@gmail.com> wrote: >> >> >> > Hi Arie, >> >> >> > >> >> >> > Thank you for your further research into the issue. >> >> >> > >> >> >> > Regarding Stata: On the other hand, JMP gives model matrices that >> >> >> > use >> >> >> > the >> >> >> > main effects contrasts in computing the higher order interactions, >> >> >> > without >> >> >> > the dummy variable encoding. I verified this both by analyzing the >> >> >> > linear >> >> >> > model given in my first example and noting that JMP has one more >> >> >> > degree >> >> >> > of >> >> >> > freedom than R for the same model, as well as looking at the >> >> >> > generated >> >> >> > model >> >> >> > matrices. It's easy to find a design where JMP will allow us fit >> >> >> > our >> >> >> > model >> >> >> > with goodness-of-fit estimates and R will not due to the extra >> >> >> > degree(s) >> >> >> > of >> >> >> > freedom required. Let's keep the conversation limited to R. >> >> >> > >> >> >> > I want to refocus back onto my original bug report, which was not >> >> >> > for >> >> >> > a >> >> >> > missing main effects term, but rather for a missing lower-order >> >> >> > interaction >> >> >> > term. The behavior of model.matrix.default() for a missing main >> >> >> > effects >> >> >> > term >> >> >> > is a nice example to demonstrate how model.matrix encodes with >> >> >> > dummy >> >> >> > variables instead of contrasts, but doesn't demonstrate the >> >> >> > inconsistent >> >> >> > behavior my bug report highlighted. >> >> >> > >> >> >> > I went looking for documentation on this behavior, and the issue >> >> >> > stems >> >> >> > not >> >> >> > from model.matrix.default(), but rather the terms() function in >> >> >> > interpreting >> >> >> > the formula. This "clever" replacement of contrasts by dummy >> >> >> > variables >> >> >> > to >> >> >> > maintain marginality (presuming that's the reason) is not >> >> >> > described >> >> >> > anywhere >> >> >> > in the documentation for either the model.matrix() or the terms() >> >> >> > function. >> >> >> > In order to find a description for the behavior, I had to look in >> >> >> > the >> >> >> > underlying C code, buried above the "TermCode" function of the >> >> >> > "model.c" >> >> >> > file, which says: >> >> >> > >> >> >> > "TermCode decides on the encoding of a model term. Returns 1 if >> >> >> > variable >> >> >> > ``whichBit'' in ``thisTerm'' is to be encoded by contrasts and 2 >> >> >> > if >> >> >> > it >> >> >> > is to >> >> >> > be encoded by dummy variables. This is decided using the >> >> >> > heuristic >> >> >> > described in Statistical Models in S, page 38." >> >> >> > >> >> >> > I do not have a copy of this book, and I suspect most R users do >> >> >> > not >> >> >> > as >> >> >> > well. Thankfully, however, some of the pages describing this >> >> >> > behavior >> >> >> > were >> >> >> > available as part of Amazon's "Look Inside" feature--but if not >> >> >> > for >> >> >> > that, I >> >> >> > would have no idea what heuristic R was using. Since those pages >> >> >> > could >> >> >> > made >> >> >> > unavailable by Amazon at any time, at the very least we have an >> >> >> > problem >> >> >> > with >> >> >> > a lack of documentation. >> >> >> > >> >> >> > However, I still believe there is a bug when comparing R's >> >> >> > implementation to >> >> >> > the heuristic described in the book. From Statistical Models in S, >> >> >> > page >> >> >> > 38-39: >> >> >> > >> >> >> > "Suppose F_j is any factor included in term T_i. Let T_{i(j)} >> >> >> > denote >> >> >> > the >> >> >> > margin of T_i for factor F_j--that is, the term obtained by >> >> >> > dropping >> >> >> > F_j >> >> >> > from T_i. We say that T_{i(j)} has appeared in the formula if >> >> >> > there >> >> >> > is >> >> >> > some >> >> >> > term T_i' for i' < i such that T_i' contains all the factors >> >> >> > appearing >> >> >> > in >> >> >> > T_{i(j)}. The usual case is that T_{i(j)} itself is one of the >> >> >> > preceding >> >> >> > terms. Then F_j is coded by contrasts if T_{i(j)} has appeared in >> >> >> > the >> >> >> > formula and by dummy variables if it has not" >> >> >> > >> >> >> > Here, F_j refers to a factor (variable) in a model and not a >> >> >> > categorical >> >> >> > factor, as specified later in that section (page 40): "Numeric >> >> >> > variables >> >> >> > appear in the computations as themselves, uncoded. Therefore, the >> >> >> > rule >> >> >> > does >> >> >> > not do anything special for them, and it remains valid, in a >> >> >> > trivial >> >> >> > sense, >> >> >> > whenever any of the F_j is numeric rather than categorical." >> >> >> > >> >> >> > Going back to my original example with three variables: X1 >> >> >> > (numeric), >> >> >> > X2 >> >> >> > (numeric), X3 (categorical). This heuristic prescribes encoding >> >> >> > X1:X2:X3 >> >> >> > with contrasts as long as X1:X2, X1:X3, and X2:X3 exist in the >> >> >> > formula. >> >> >> > When >> >> >> > any of the preceding terms do not exist, this heuristic tells us >> >> >> > to >> >> >> > use >> >> >> > dummy variables to encode the interaction (e.g. "F_j [the >> >> >> > interaction >> >> >> > term] >> >> >> > is coded ... by dummy variables if it [any of the marginal terms >> >> >> > obtained by >> >> >> > dropping a single factor in the interaction] has not [appeared in >> >> >> > the >> >> >> > formula]"). However, the example I gave demonstrated that this >> >> >> > dummy >> >> >> > variable encoding only occurs for the model where the missing term >> >> >> > is >> >> >> > the >> >> >> > numeric-numeric interaction, "~(X1+X2+X3)^3-X1:X2". Otherwise, the >> >> >> > interaction term X1:X2:X3 is encoded by contrasts, not dummy >> >> >> > variables. >> >> >> > This >> >> >> > is inconsistent with the description of the intended behavior >> >> >> > given >> >> >> > in >> >> >> > the >> >> >> > book. >> >> >> > >> >> >> > Best regards, >> >> >> > Tyler >> >> >> > >> >> >> > >> >> >> > On Fri, Oct 27, 2017 at 2:18 PM, Arie ten Cate >> >> >> > <arietenc...@gmail.com> >> >> >> > wrote: >> >> >> >> >> >> >> >> Hello Tyler, >> >> >> >> >> >> >> >> I want to bring to your attention the following document: "What >> >> >> >> happens if you omit the main effect in a regression model with an >> >> >> >> interaction?" >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> (https://stats.idre.ucla.edu/stata/faq/what-happens-if-you-omit-the-main-effect-in-a-regression-model-with-an-interaction). >> >> >> >> This gives a useful review of the problem. Your example is Case >> >> >> >> 2: a >> >> >> >> continuous and a categorical regressor. >> >> >> >> >> >> >> >> The numerical examples are coded in Stata, and they give the same >> >> >> >> result as in R. Hence, if this is a bug in R then it is also a >> >> >> >> bug >> >> >> >> in >> >> >> >> Stata. That seems very unlikely. >> >> >> >> >> >> >> >> Here is a simulation in R of the above mentioned Case 2 in Stata: >> >> >> >> >> >> >> >> df <- expand.grid(socst=c(-1:1),grp=c("1","2","3","4")) >> >> >> >> print("Full model") >> >> >> >> print(model.matrix(~(socst+grp)^2 ,data=df)) >> >> >> >> print("Example 2.1: drop socst") >> >> >> >> print(model.matrix(~(socst+grp)^2 -socst ,data=df)) >> >> >> >> print("Example 2.2: drop grp") >> >> >> >> print(model.matrix(~(socst+grp)^2 -grp ,data=df)) >> >> >> >> >> >> >> >> This gives indeed the following regressors: >> >> >> >> >> >> >> >> "Full model" >> >> >> >> (Intercept) socst grp2 grp3 grp4 socst:grp2 socst:grp3 socst:grp4 >> >> >> >> "Example 2.1: drop socst" >> >> >> >> (Intercept) grp2 grp3 grp4 socst:grp1 socst:grp2 socst:grp3 >> >> >> >> socst:grp4 >> >> >> >> "Example 2.2: drop grp" >> >> >> >> (Intercept) socst socst:grp2 socst:grp3 socst:grp4 >> >> >> >> >> >> >> >> There is a little bit of R documentation about this, based on the >> >> >> >> concept of marginality, which typically forbids a model having an >> >> >> >> interaction but not the corresponding main effects. (You might >> >> >> >> see >> >> >> >> the >> >> >> >> references in >> >> >> >> https://en.wikipedia.org/wiki/Principle_of_marginality >> >> >> >> ) >> >> >> >> See "An Introduction to R", by Venables and Smith and the R >> >> >> >> Core >> >> >> >> Team. At the bottom of page 52 (PDF: 57) it says: "Although the >> >> >> >> details are complicated, model formulae in R will normally >> >> >> >> generate >> >> >> >> the models that an expert statistician would expect, provided >> >> >> >> that >> >> >> >> marginality is preserved. Fitting, for [a contrary] example, a >> >> >> >> model >> >> >> >> with an interaction but not the corresponding main effects will >> >> >> >> in >> >> >> >> general lead to surprising results ....". >> >> >> >> The Reference Manual states that the R functions dropterm() >> >> >> >> and >> >> >> >> addterm() resp. drop or add only terms such that marginality is >> >> >> >> preserved. >> >> >> >> >> >> >> >> Finally, about your singular matrix t(mm)%*%mm. This is in fact >> >> >> >> Example 2.1 in Case 2 discussed above. As discussed there, in >> >> >> >> Stata >> >> >> >> and in R the drop of the continuous variable has no effect on the >> >> >> >> degrees of freedom here: it is just a reparameterisation of the >> >> >> >> full >> >> >> >> model, protecting you against losing marginality... Hence the >> >> >> >> model.matrix 'mm' is still square and nonsingular after the drop >> >> >> >> of >> >> >> >> X1, unless of course when a row is removed from the matrix >> >> >> >> 'design' >> >> >> >> when before creating 'mm'. >> >> >> >> >> >> >> >> Arie >> >> >> >> >> >> >> >> On Sun, Oct 15, 2017 at 7:05 PM, Tyler <tyle...@gmail.com> wrote: >> >> >> >> > You could possibly try to explain away the behavior for a >> >> >> >> > missing >> >> >> >> > main >> >> >> >> > effects term, since without the main effects term we don't have >> >> >> >> > main >> >> >> >> > effect >> >> >> >> > columns in the model matrix used to compute the interaction >> >> >> >> > columns >> >> >> >> > (At >> >> >> >> > best this is undocumented behavior--I still think it's a bug, >> >> >> >> > as >> >> >> >> > we >> >> >> >> > know >> >> >> >> > how we would encode the categorical factors if they were in >> >> >> >> > fact >> >> >> >> > present. >> >> >> >> > It's either specified in contrasts.arg or using the default set >> >> >> >> > in >> >> >> >> > options). However, when all the main effects are present, why >> >> >> >> > would >> >> >> >> > the >> >> >> >> > three-factor interaction column not simply be the product of >> >> >> >> > the >> >> >> >> > main >> >> >> >> > effect columns? In my example: we know X1, we know X2, and we >> >> >> >> > know >> >> >> >> > X3. >> >> >> >> > Why >> >> >> >> > does the encoding of X1:X2:X3 depend on whether we specified a >> >> >> >> > two-factor >> >> >> >> > interaction, AND only changes for specific missing >> >> >> >> > interactions? >> >> >> >> > >> >> >> >> > In addition, I can use a two-term example similar to yours to >> >> >> >> > show >> >> >> >> > how >> >> >> >> > this >> >> >> >> > behavior results in a singular covariance matrix when, given >> >> >> >> > the >> >> >> >> > desired >> >> >> >> > factor encoding, it should not be singular. >> >> >> >> > >> >> >> >> > We start with a full factorial design for a two-level >> >> >> >> > continuous >> >> >> >> > factor >> >> >> >> > and >> >> >> >> > a three-level categorical factor, and remove a single row. This >> >> >> >> > design >> >> >> >> > matrix does not leave enough degrees of freedom to determine >> >> >> >> > goodness-of-fit, but should allow us to obtain parameter >> >> >> >> > estimates. >> >> >> >> > >> >> >> >> >> design = expand.grid(X1=c(1,-1),X2=c("A","B","C")) >> >> >> >> >> design = design[-1,] >> >> >> >> >> design >> >> >> >> > X1 X2 >> >> >> >> > 2 -1 A >> >> >> >> > 3 1 B >> >> >> >> > 4 -1 B >> >> >> >> > 5 1 C >> >> >> >> > 6 -1 C >> >> >> >> > >> >> >> >> > Here, we first calculate the model matrix for the full model, >> >> >> >> > and >> >> >> >> > then >> >> >> >> > manually remove the X1 column from the model matrix. This gives >> >> >> >> > us >> >> >> >> > the >> >> >> >> > model matrix one would expect if X1 were removed from the >> >> >> >> > model. >> >> >> >> > We >> >> >> >> > then >> >> >> >> > successfully calculate the covariance matrix. >> >> >> >> > >> >> >> >> >> mm = model.matrix(~(X1+X2)^2,data=design) >> >> >> >> >> mm >> >> >> >> > (Intercept) X1 X2B X2C X1:X2B X1:X2C >> >> >> >> > 2 1 -1 0 0 0 0 >> >> >> >> > 3 1 1 1 0 1 0 >> >> >> >> > 4 1 -1 1 0 -1 0 >> >> >> >> > 5 1 1 0 1 0 1 >> >> >> >> > 6 1 -1 0 1 0 -1 >> >> >> >> > >> >> >> >> >> mm = mm[,-2] >> >> >> >> >> solve(t(mm) %*% mm) >> >> >> >> > (Intercept) X2B X2C X1:X2B X1:X2C >> >> >> >> > (Intercept) 1 -1.0 -1.0 0.0 0.0 >> >> >> >> > X2B -1 1.5 1.0 0.0 0.0 >> >> >> >> > X2C -1 1.0 1.5 0.0 0.0 >> >> >> >> > X1:X2B 0 0.0 0.0 0.5 0.0 >> >> >> >> > X1:X2C 0 0.0 0.0 0.0 0.5 >> >> >> >> > >> >> >> >> > Here, we see the actual behavior for model.matrix. The >> >> >> >> > undesired >> >> >> >> > re-coding >> >> >> >> > of the model matrix interaction term makes the information >> >> >> >> > matrix >> >> >> >> > singular. >> >> >> >> > >> >> >> >> >> mm = model.matrix(~(X1+X2)^2-X1,data=design) >> >> >> >> >> mm >> >> >> >> > (Intercept) X2B X2C X1:X2A X1:X2B X1:X2C >> >> >> >> > 2 1 0 0 -1 0 0 >> >> >> >> > 3 1 1 0 0 1 0 >> >> >> >> > 4 1 1 0 0 -1 0 >> >> >> >> > 5 1 0 1 0 0 1 >> >> >> >> > 6 1 0 1 0 0 -1 >> >> >> >> > >> >> >> >> >> solve(t(mm) %*% mm) >> >> >> >> > Error in solve.default(t(mm) %*% mm) : system is >> >> >> >> > computationally >> >> >> >> > singular: >> >> >> >> > reciprocal condition number = 5.55112e-18 >> >> >> >> > >> >> >> >> > I still believe this is a bug. >> >> >> >> > >> >> >> >> > Best regards, >> >> >> >> > Tyler Morgan-Wall >> >> >> >> > >> >> >> >> > On Sun, Oct 15, 2017 at 1:49 AM, Arie ten Cate >> >> >> >> > <arietenc...@gmail.com> >> >> >> >> > wrote: >> >> >> >> > >> >> >> >> >> I think it is not a bug. It is a general property of >> >> >> >> >> interactions. >> >> >> >> >> This property is best observed if all variables are factors >> >> >> >> >> (qualitative). >> >> >> >> >> >> >> >> >> >> For example, you have three variables (factors). You ask for >> >> >> >> >> as >> >> >> >> >> many >> >> >> >> >> interactions as possible, except an interaction term between >> >> >> >> >> two >> >> >> >> >> particular variables. When this interaction is not a constant, >> >> >> >> >> it >> >> >> >> >> is >> >> >> >> >> different for different values of the remaining variable. More >> >> >> >> >> precisely: for all values of that variable. In other words: >> >> >> >> >> you >> >> >> >> >> have >> >> >> >> >> a >> >> >> >> >> three-way interaction, with all values of that variable. >> >> >> >> >> >> >> >> >> >> An even smaller example is the following script with only two >> >> >> >> >> variables, each being a factor: >> >> >> >> >> >> >> >> >> >> df <- expand.grid(X1=c("p","q"), X2=c("A","B","C")) >> >> >> >> >> print(model.matrix(~(X1+X2)^2 ,data=df)) >> >> >> >> >> print(model.matrix(~(X1+X2)^2 -X1,data=df)) >> >> >> >> >> print(model.matrix(~(X1+X2)^2 -X2,data=df)) >> >> >> >> >> >> >> >> >> >> The result is: >> >> >> >> >> >> >> >> >> >> (Intercept) X1q X2B X2C X1q:X2B X1q:X2C >> >> >> >> >> 1 1 0 0 0 0 0 >> >> >> >> >> 2 1 1 0 0 0 0 >> >> >> >> >> 3 1 0 1 0 0 0 >> >> >> >> >> 4 1 1 1 0 1 0 >> >> >> >> >> 5 1 0 0 1 0 0 >> >> >> >> >> 6 1 1 0 1 0 1 >> >> >> >> >> >> >> >> >> >> (Intercept) X2B X2C X1q:X2A X1q:X2B X1q:X2C >> >> >> >> >> 1 1 0 0 0 0 0 >> >> >> >> >> 2 1 0 0 1 0 0 >> >> >> >> >> 3 1 1 0 0 0 0 >> >> >> >> >> 4 1 1 0 0 1 0 >> >> >> >> >> 5 1 0 1 0 0 0 >> >> >> >> >> 6 1 0 1 0 0 1 >> >> >> >> >> >> >> >> >> >> (Intercept) X1q X1p:X2B X1q:X2B X1p:X2C X1q:X2C >> >> >> >> >> 1 1 0 0 0 0 0 >> >> >> >> >> 2 1 1 0 0 0 0 >> >> >> >> >> 3 1 0 1 0 0 0 >> >> >> >> >> 4 1 1 0 1 0 0 >> >> >> >> >> 5 1 0 0 0 1 0 >> >> >> >> >> 6 1 1 0 0 0 1 >> >> >> >> >> >> >> >> >> >> Thus, in the second result, we have no main effect of X1. >> >> >> >> >> Instead, >> >> >> >> >> the >> >> >> >> >> effect of X1 depends on the value of X2; either A or B or C. >> >> >> >> >> In >> >> >> >> >> fact, >> >> >> >> >> this is a two-way interaction, including all three values of >> >> >> >> >> X2. >> >> >> >> >> In >> >> >> >> >> the third result, we have no main effect of X2, The effect of >> >> >> >> >> X2 >> >> >> >> >> depends on the value of X1; either p or q. >> >> >> >> >> >> >> >> >> >> A complicating element with your example seems to be that your >> >> >> >> >> X1 >> >> >> >> >> and >> >> >> >> >> X2 are not factors. >> >> >> >> >> >> >> >> >> >> Arie >> >> >> >> >> >> >> >> >> >> On Thu, Oct 12, 2017 at 7:12 PM, Tyler <tyle...@gmail.com> >> >> >> >> >> wrote: >> >> >> >> >> > Hi, >> >> >> >> >> > >> >> >> >> >> > I recently ran into an inconsistency in the way >> >> >> >> >> > model.matrix.default >> >> >> >> >> > handles factor encoding for higher level interactions with >> >> >> >> >> > categorical >> >> >> >> >> > variables when the full hierarchy of effects is not present. >> >> >> >> >> > Depending on >> >> >> >> >> > which lower level interactions are specified, the factor >> >> >> >> >> > encoding >> >> >> >> >> > changes >> >> >> >> >> > for a higher level interaction. Consider the following >> >> >> >> >> > minimal >> >> >> >> >> reproducible >> >> >> >> >> > example: >> >> >> >> >> > >> >> >> >> >> > -------------- >> >> >> >> >> > >> >> >> >> >> >> runmatrix = >> >> >> >> >> >> expand.grid(X1=c(1,-1),X2=c(1,-1),X3=c("A","B","C"))> >> >> >> >> >> model.matrix(~(X1+X2+X3)^3,data=runmatrix) (Intercept) X1 X2 >> >> >> >> >> X3B >> >> >> >> >> X3C >> >> >> >> >> X1:X2 X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C >> >> >> >> >> > 1 1 1 1 0 0 1 0 0 0 >> >> >> >> >> > 0 >> >> >> >> >> > 0 0 >> >> >> >> >> > 2 1 -1 1 0 0 -1 0 0 0 >> >> >> >> >> > 0 >> >> >> >> >> > 0 0 >> >> >> >> >> > 3 1 1 -1 0 0 -1 0 0 0 >> >> >> >> >> > 0 >> >> >> >> >> > 0 0 >> >> >> >> >> > 4 1 -1 -1 0 0 1 0 0 0 >> >> >> >> >> > 0 >> >> >> >> >> > 0 0 >> >> >> >> >> > 5 1 1 1 1 0 1 1 0 1 >> >> >> >> >> > 0 >> >> >> >> >> > 1 0 >> >> >> >> >> > 6 1 -1 1 1 0 -1 -1 0 1 >> >> >> >> >> > 0 >> >> >> >> >> > -1 0 >> >> >> >> >> > 7 1 1 -1 1 0 -1 1 0 -1 >> >> >> >> >> > 0 >> >> >> >> >> > -1 0 >> >> >> >> >> > 8 1 -1 -1 1 0 1 -1 0 -1 >> >> >> >> >> > 0 >> >> >> >> >> > 1 0 >> >> >> >> >> > 9 1 1 1 0 1 1 0 1 0 >> >> >> >> >> > 1 >> >> >> >> >> > 0 1 >> >> >> >> >> > 10 1 -1 1 0 1 -1 0 -1 0 >> >> >> >> >> > 1 >> >> >> >> >> > 0 -1 >> >> >> >> >> > 11 1 1 -1 0 1 -1 0 1 0 >> >> >> >> >> > -1 >> >> >> >> >> > 0 -1 >> >> >> >> >> > 12 1 -1 -1 0 1 1 0 -1 0 >> >> >> >> >> > -1 >> >> >> >> >> > 0 1 >> >> >> >> >> > attr(,"assign") >> >> >> >> >> > [1] 0 1 2 3 3 4 5 5 6 6 7 7 >> >> >> >> >> > attr(,"contrasts") >> >> >> >> >> > attr(,"contrasts")$X3 >> >> >> >> >> > [1] "contr.treatment" >> >> >> >> >> > >> >> >> >> >> > -------------- >> >> >> >> >> > >> >> >> >> >> > Specifying the full hierarchy gives us what we expect: the >> >> >> >> >> > interaction >> >> >> >> >> > columns are simply calculated from the product of the main >> >> >> >> >> > effect >> >> >> >> >> columns. >> >> >> >> >> > The interaction term X1:X2:X3 gives us two columns in the >> >> >> >> >> > model >> >> >> >> >> > matrix, >> >> >> >> >> > X1:X2:X3B and X1:X2:X3C, matching the products of the main >> >> >> >> >> > effects. >> >> >> >> >> > >> >> >> >> >> > If we remove either the X2:X3 interaction or the X1:X3 >> >> >> >> >> > interaction, >> >> >> >> >> > we >> >> >> >> >> get >> >> >> >> >> > what we would expect for the X1:X2:X3 interaction, but when >> >> >> >> >> > we >> >> >> >> >> > remove >> >> >> >> >> > the >> >> >> >> >> > X1:X2 interaction the encoding for X1:X2:X3 changes >> >> >> >> >> > completely: >> >> >> >> >> > >> >> >> >> >> > -------------- >> >> >> >> >> > >> >> >> >> >> >> model.matrix(~(X1+X2+X3)^3-X1:X3,data=runmatrix) >> >> >> >> >> >> (Intercept) >> >> >> >> >> >> X1 >> >> >> >> >> >> X2 >> >> >> >> >> X3B X3C X1:X2 X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C >> >> >> >> >> > 1 1 1 1 0 0 1 0 0 0 >> >> >> >> >> > 0 >> >> >> >> >> > 2 1 -1 1 0 0 -1 0 0 0 >> >> >> >> >> > 0 >> >> >> >> >> > 3 1 1 -1 0 0 -1 0 0 0 >> >> >> >> >> > 0 >> >> >> >> >> > 4 1 -1 -1 0 0 1 0 0 0 >> >> >> >> >> > 0 >> >> >> >> >> > 5 1 1 1 1 0 1 1 0 1 >> >> >> >> >> > 0 >> >> >> >> >> > 6 1 -1 1 1 0 -1 1 0 -1 >> >> >> >> >> > 0 >> >> >> >> >> > 7 1 1 -1 1 0 -1 -1 0 -1 >> >> >> >> >> > 0 >> >> >> >> >> > 8 1 -1 -1 1 0 1 -1 0 1 >> >> >> >> >> > 0 >> >> >> >> >> > 9 1 1 1 0 1 1 0 1 0 >> >> >> >> >> > 1 >> >> >> >> >> > 10 1 -1 1 0 1 -1 0 1 0 >> >> >> >> >> > -1 >> >> >> >> >> > 11 1 1 -1 0 1 -1 0 -1 0 >> >> >> >> >> > -1 >> >> >> >> >> > 12 1 -1 -1 0 1 1 0 -1 0 >> >> >> >> >> > 1 >> >> >> >> >> > attr(,"assign") >> >> >> >> >> > [1] 0 1 2 3 3 4 5 5 6 6 >> >> >> >> >> > attr(,"contrasts") >> >> >> >> >> > attr(,"contrasts")$X3 >> >> >> >> >> > [1] "contr.treatment" >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> >> model.matrix(~(X1+X2+X3)^3-X2:X3,data=runmatrix) >> >> >> >> >> >> (Intercept) >> >> >> >> >> >> X1 >> >> >> >> >> >> X2 >> >> >> >> >> X3B X3C X1:X2 X1:X3B X1:X3C X1:X2:X3B X1:X2:X3C >> >> >> >> >> > 1 1 1 1 0 0 1 0 0 0 >> >> >> >> >> > 0 >> >> >> >> >> > 2 1 -1 1 0 0 -1 0 0 0 >> >> >> >> >> > 0 >> >> >> >> >> > 3 1 1 -1 0 0 -1 0 0 0 >> >> >> >> >> > 0 >> >> >> >> >> > 4 1 -1 -1 0 0 1 0 0 0 >> >> >> >> >> > 0 >> >> >> >> >> > 5 1 1 1 1 0 1 1 0 1 >> >> >> >> >> > 0 >> >> >> >> >> > 6 1 -1 1 1 0 -1 -1 0 -1 >> >> >> >> >> > 0 >> >> >> >> >> > 7 1 1 -1 1 0 -1 1 0 -1 >> >> >> >> >> > 0 >> >> >> >> >> > 8 1 -1 -1 1 0 1 -1 0 1 >> >> >> >> >> > 0 >> >> >> >> >> > 9 1 1 1 0 1 1 0 1 0 >> >> >> >> >> > 1 >> >> >> >> >> > 10 1 -1 1 0 1 -1 0 -1 0 >> >> >> >> >> > -1 >> >> >> >> >> > 11 1 1 -1 0 1 -1 0 1 0 >> >> >> >> >> > -1 >> >> >> >> >> > 12 1 -1 -1 0 1 1 0 -1 0 >> >> >> >> >> > 1 >> >> >> >> >> > attr(,"assign") >> >> >> >> >> > [1] 0 1 2 3 3 4 5 5 6 6 >> >> >> >> >> > attr(,"contrasts") >> >> >> >> >> > attr(,"contrasts")$X3 >> >> >> >> >> > [1] "contr.treatment" >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> >> model.matrix(~(X1+X2+X3)^3-X1:X2,data=runmatrix) >> >> >> >> >> >> (Intercept) >> >> >> >> >> >> X1 >> >> >> >> >> >> X2 >> >> >> >> >> X3B X3C X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3A X1:X2:X3B >> >> >> >> >> X1:X2:X3C >> >> >> >> >> > 1 1 1 1 0 0 0 0 0 0 >> >> >> >> >> > 1 >> >> >> >> >> > 0 0 >> >> >> >> >> > 2 1 -1 1 0 0 0 0 0 0 >> >> >> >> >> > -1 >> >> >> >> >> > 0 0 >> >> >> >> >> > 3 1 1 -1 0 0 0 0 0 0 >> >> >> >> >> > -1 >> >> >> >> >> > 0 0 >> >> >> >> >> > 4 1 -1 -1 0 0 0 0 0 0 >> >> >> >> >> > 1 >> >> >> >> >> > 0 0 >> >> >> >> >> > 5 1 1 1 1 0 1 0 1 0 >> >> >> >> >> > 0 >> >> >> >> >> > 1 0 >> >> >> >> >> > 6 1 -1 1 1 0 -1 0 1 0 >> >> >> >> >> > 0 >> >> >> >> >> > -1 0 >> >> >> >> >> > 7 1 1 -1 1 0 1 0 -1 0 >> >> >> >> >> > 0 >> >> >> >> >> > -1 0 >> >> >> >> >> > 8 1 -1 -1 1 0 -1 0 -1 0 >> >> >> >> >> > 0 >> >> >> >> >> > 1 0 >> >> >> >> >> > 9 1 1 1 0 1 0 1 0 1 >> >> >> >> >> > 0 >> >> >> >> >> > 0 1 >> >> >> >> >> > 10 1 -1 1 0 1 0 -1 0 1 >> >> >> >> >> > 0 >> >> >> >> >> > 0 -1 >> >> >> >> >> > 11 1 1 -1 0 1 0 1 0 -1 >> >> >> >> >> > 0 >> >> >> >> >> > 0 -1 >> >> >> >> >> > 12 1 -1 -1 0 1 0 -1 0 -1 >> >> >> >> >> > 0 >> >> >> >> >> > 0 1 >> >> >> >> >> > attr(,"assign") >> >> >> >> >> > [1] 0 1 2 3 3 4 4 5 5 6 6 6 >> >> >> >> >> > attr(,"contrasts") >> >> >> >> >> > attr(,"contrasts")$X3 >> >> >> >> >> > [1] "contr.treatment" >> >> >> >> >> > >> >> >> >> >> > -------------- >> >> >> >> >> > >> >> >> >> >> > Here, we now see the encoding for the interaction X1:X2:X3 >> >> >> >> >> > is >> >> >> >> >> > now >> >> >> >> >> > the >> >> >> >> >> > interaction of X1 and X2 with a new encoding for X3 where >> >> >> >> >> > each >> >> >> >> >> > factor >> >> >> >> >> level >> >> >> >> >> > is represented by its own column. I would expect, given the >> >> >> >> >> > two >> >> >> >> >> > column >> >> >> >> >> > dummy variable encoding for X3, that the X1:X2:X3 column >> >> >> >> >> > would >> >> >> >> >> > also >> >> >> >> >> > be >> >> >> >> >> two >> >> >> >> >> > columns regardless of what two-factor interactions we also >> >> >> >> >> > specified, >> >> >> >> >> > but >> >> >> >> >> > in this case it switches to three. If other two factor >> >> >> >> >> > interactions >> >> >> >> >> > are >> >> >> >> >> > missing in addition to X1:X2, this issue still occurs. This >> >> >> >> >> > also >> >> >> >> >> > happens >> >> >> >> >> > regardless of the contrast specified in contrasts.arg for >> >> >> >> >> > X3. I >> >> >> >> >> > don't >> >> >> >> >> > see >> >> >> >> >> > any reasoning for this behavior given in the documentation, >> >> >> >> >> > so >> >> >> >> >> > I >> >> >> >> >> > suspect >> >> >> >> >> it >> >> >> >> >> > is a bug. >> >> >> >> >> > >> >> >> >> >> > Best regards, >> >> >> >> >> > Tyler Morgan-Wall >> >> >> >> >> > >> >> >> >> >> > [[alternative HTML version deleted]] >> >> >> >> >> > >> >> >> >> >> > ______________________________________________ >> > >> > > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel