Hello Tyler,
Thank you for searching for, and finding, the basic description of the
behavior of R in this matter.
I think your example is in agreement with the book.
But let me first note the following. You write: "F_j refers to a
factor (variable) in a model and not a categorical factor". However:
"a factor is a vector object used to specify a discrete
classification" (start of chapter 4 of "An Introduction to R".) You
might also see the description of the R function factor().
You note that the book says about a factor F_j:
"... F_j is coded by contrasts if T_{i(j)} has appeared in the
formula and by dummy variables if it has not"
You find:
"However, the example I gave demonstrated that this dummy variable
encoding only occurs for the model where the missing term is the
numeric-numeric interaction, ~(X1+X2+X3)^3-X1:X2."
We have here T_i = X1:X2:X3. Also: F_j = X3 (the only factor). Then
T_{i(j)} = X1:X2, which is dropped from the model. Hence the X3 in T_i
must be encoded by dummy variables, as indeed it is.
Arie
On Tue, Oct 31, 2017 at 4:01 PM, Tyler wrote:
> Hi Arie,
>
> Thank you for your further research into the issue.
>
> Regarding Stata: On the other hand, JMP gives model matrices that use the
> main effects contrasts in computing the higher order interactions, without
> the dummy variable encoding. I verified this both by analyzing the linear
> model given in my first example and noting that JMP has one more degree of
> freedom than R for the same model, as well as looking at the generated model
> matrices. It's easy to find a design where JMP will allow us fit our model
> with goodness-of-fit estimates and R will not due to the extra degree(s) of
> freedom required. Let's keep the conversation limited to R.
>
> I want to refocus back onto my original bug report, which was not for a
> missing main effects term, but rather for a missing lower-order interaction
> term. The behavior of model.matrix.default() for a missing main effects term
> is a nice example to demonstrate how model.matrix encodes with dummy
> variables instead of contrasts, but doesn't demonstrate the inconsistent
> behavior my bug report highlighted.
>
> I went looking for documentation on this behavior, and the issue stems not
> from model.matrix.default(), but rather the terms() function in interpreting
> the formula. This "clever" replacement of contrasts by dummy variables to
> maintain marginality (presuming that's the reason) is not described anywhere
> in the documentation for either the model.matrix() or the terms() function.
> In order to find a description for the behavior, I had to look in the
> underlying C code, buried above the "TermCode" function of the "model.c"
> file, which says:
>
> "TermCode decides on the encoding of a model term. Returns 1 if variable
> ``whichBit'' in ``thisTerm'' is to be encoded by contrasts and 2 if it is to
> be encoded by dummy variables. This is decided using the heuristic
> described in Statistical Models in S, page 38."
>
> I do not have a copy of this book, and I suspect most R users do not as
> well. Thankfully, however, some of the pages describing this behavior were
> available as part of Amazon's "Look Inside" feature--but if not for that, I
> would have no idea what heuristic R was using. Since those pages could made
> unavailable by Amazon at any time, at the very least we have an problem with
> a lack of documentation.
>
> However, I still believe there is a bug when comparing R's implementation to
> the heuristic described in the book. From Statistical Models in S, page
> 38-39:
>
> "Suppose F_j is any factor included in term T_i. Let T_{i(j)} denote the
> margin of T_i for factor F_j--that is, the term obtained by dropping F_j
> from T_i. We say that T_{i(j)} has appeared in the formula if there is some
> term T_i' for i' < i such that T_i' contains all the factors appearing in
> T_{i(j)}. The usual case is that T_{i(j)} itself is one of the preceding
> terms. Then F_j is coded by contrasts if T_{i(j)} has appeared in the
> formula and by dummy variables if it has not"
>
> Here, F_j refers to a factor (variable) in a model and not a categorical
> factor, as specified later in that section (page 40): "Numeric variables
> appear in the computations as themselves, uncoded. Therefore, the rule does
> not do anything special for them, and it remains valid, in a trivial sense,
> whenever any of the F_j is numeric rather than categorical."
>
> Going back to my original example with three variables: X1 (numeric), X2
> (numeric), X3 (categorical). This heuristic prescribes encoding X1:X2:X3
> with contrasts as long as X1:X2, X1:X3, and X2:X3 exist in the formula. When
> any of the preceding terms do not exist, this heuristic tells us to use
> dummy variables to encode the interaction (e.g. "F_j [the interaction term]
> is coded ... by dummy variables if it [any of the marginal terms obtained by
> dropping a single factor in the interaction] has not [appeared in the