Re: [R] why NA coefficients

array chip Tue, 08 Nov 2011 10:59:39 -0800

Hi Dennis, Thanks very much for the details. All those explanations about 
non-estimable mu_{12} when it has no data make sense to me!

Regarding my specific example data where mu_{12} should NOT be estimable in a 
linear model with interaction because it has no data, yet the linear model I 
created by using lm() in R still CAN estimate the mean mu_{12}, while on the 
other hand, mu_{72} is instead NOT estimable from lm() even this category does 
have data. Does this contradiction to the theory imply that the linear model by 
lm() in R on my specific example data is NOT reliable/trustable and should not 
be used?

Thanks

John

________________________________
From: Dennis Murphy <djmu...@gmail.com>

Cc: "r-help@r-project.org" <r-help@r-project.org>
Sent: Tuesday, November 8, 2011 10:22 AM
Subject: Re: [R] why NA coefficients

The cell mean mu_{12} is non-estimable because it has no data in the
cell. How can you estimate something that's not there (at least
without imputation :)? Every parametric function that involves mu_{12}
will also be non-estimable - in particular,  the interaction term and
the population marginal means . That's why you get the NA estimates
and the warning. All this follows from the linear model theory
described in, for example, Milliken and Johnson (1992), Analysis of
Messy Data, vol. 1, ch. 13.

Here's an example from Milliken and Johnson (1992) to illustrate:
          B1         B2       B3
T1      2, 6                   8, 6
T2        3          14      12, 9
T3        6           9

Assume a cell means model E(Y_{ijk}) = \mu_{ij}, where the cell means
are estimated by the cell averages.

From M & J (p. 173, whence this example is taken):
"Whenever treatment combinations are missing, certain
hypotheses cannot be tested without making some
additional assumptions about the parameters in the model.
Hypotheses involving parameters corresponding to the
missing cells generally cannot be tested. For example,
for the data [above] it is not possible to estimate any
linear combinations (or to test any hypotheses) that
involve parameters \mu_{12} and \mu_{33} unless one
is willing to make some assumptions about them."

They continue:
"One common assumption is that there is no
interactions between the levels of T and the levels of B.
In our opinion, this assumption should not be made
without some supporting experimental evidence."

In other words, removing the interaction term makes the
non-estimability problem disappear, but it's a copout unless there is
some tangible scientific justification for an additive rather than an
interaction model.

For the above data, M & J note that it is not possible to estimate all
of the expected marginal means - in particular, one cannot estimate
the population marginal means $\bar{\mu}_{1.}$, $\bar{\mu}_{3.}$,
$\bar{\mu}_{.2}$ or $\bar{\mu}_{.3}$. OTOH, $\bar{\mu}_{2.}$ and
$\bar{\mu}_{.1}$ since these functions of the parameters involve terms
associated with the means of the missing cells. Moreover, any
hypotheses involving parametric functions that contain non-estimable
cell means are not testable. In this example, the test of equal row
population marginal means is not testable because $\bar{\mu}_{1.}$ and
$\bar{\mu}_{3.}$ are not estimable.

[Aside: if the term parametric function is not familiar, in this
context it refers to linear combinations of model parameters.  In the
M & J example, $\bar{\mu}_{1.} = \mu_{11} + \mu_{12} + \mu_{13}$ is a
parametric function.]

Hopefully this sheds some light on the situation.

Dennis

> Hi Dennis,
> The cell mean mu_12 from the model involves the intercept and factor 2:
> Coefficients:
>                   (Intercept)                 factor(treat)2
> factor(treat)3
>                      0.429244
> 0.499982                       0.352971
>                factor(treat)4                 factor(treat)5
> factor(treat)6
>                     -0.204752
> 0.142042                       0.044155
>                factor(treat)7                 factor(group)2
> factor(treat)2:factor(group)2
>                     -0.007775
> -0.337907                      -0.208734
> factor(treat)3:factor(group)2  factor(treat)4:factor(group)2
> factor(treat)5:factor(group)2
>                     -0.195138
> 0.800029                       0.227514
> factor(treat)6:factor(group)2  factor(treat)7:factor(group)2
>                      0.331548                             NA
> So mu_12 = 0.429244-0.337907 = 0.091337. This can be verified by:
>> predict(fit,data.frame(list(treat=1,group=2)))
>          1
> 0.09133691
> Warning message:
> In predict.lm(fit, data.frame(list(treat = 1, group = 2))) :
>   prediction from a rank-deficient fit may be misleading
>
> But as you can see, it gave a warning about rank-deficient fit... why this
> is a rank-deficient fit?
> Because "treat 1_group 2" has no cases, so why it is still estimable while
> on the contrary, "treat 7_group 2" which has 2 cases is not?
> Thanks
> John
>
>
>
>
> ________________________________
> From: Dennis Murphy <djmu...@gmail.com>

> Sent: Monday, November 7, 2011 9:29 PM
> Subject: Re: [R] why NA coefficients
>
> Hi John:
>
> What is the estimate of the cell mean \mu_{12}? Which model effects
> involve that cell mean? With this data arrangement, the expected
> population marginal means of treatment 1 and group 2 are not estimable
> either, unless you're willing to assume a no-interaction model.
>
> Chapters 13 and 14 of Milliken and Johnson's Analysis of Messy Data
> (vol. 1) cover this topic in some detail, but it assumes you're
> familiar with the matrix form of a linear statistical model. Both
> chapters cover the two-way model with interaction - Ch.13 from the
> cell means model approach and Ch. 14 from the model effects approach.
> Because this was written in the mid 80s and republished in the early
> 90s, all the code used is in SAS.
>
> HTH,
> Dennis
>

>> Thanks David. The only category that has no cases is "treat 1-group 2":
>>
>>> with(test,table(treat,group))
>>      group
>> treat 1 2
>>     1 8 0
>>     2 1 5
>>     3 5 5
>>     4 7 3
>>     5 7 4
>>     6 3 3
>>     7 8 2
>>
>> But why the coefficient for "treat 7-group 2" is not estimable?
>>
>> Thanks
>>
>> John
>>
>>
>>
>>
>> ________________________________
>> From: David Winsemius <dwinsem...@comcast.net>
>>
>> Cc: "r-help@r-project.org" <r-help@r-project.org>
>> Sent: Monday, November 7, 2011 5:13 PM
>> Subject: Re: [R] why NA coefficients
>>
>>
>> On Nov 7, 2011, at 7:33 PM, array chip wrote:
>>
>>> Hi, I am trying to run ANOVA with an interaction term on 2 factors (treat
>>> has 7 levels, group has 2 levels). I found the coefficient for the last
>>> interaction term is always 0, see attached dataset and the code below:
>>>
>>>> test<-read.table("test.txt",sep='\t',header=T,row.names=NULL)
>>>> lm(y~factor(treat)*factor(group),test)
>>>
>>> Call:
>>> lm(formula = y ~ factor(treat) * factor(group), data = test)
>>>
>>> Coefficients:
>>>                   (Intercept)                 factor(treat)2
>>>    factor(treat)3
>>>                      0.429244                       0.499982
>>>          0.352971
>>>                factor(treat)4                 factor(treat)5
>>>    factor(treat)6
>>>                     -0.204752                       0.142042
>>>          0.044155
>>>                factor(treat)7                 factor(group)2
>>> factor(treat)2:factor(group)2
>>>                     -0.007775                      -0.337907
>>>         -0.208734
>>> factor(treat)3:factor(group)2  factor(treat)4:factor(group)2
>>> factor(treat)5:factor(group)2
>>>                     -0.195138                       0.800029
>>>          0.227514
>>> factor(treat)6:factor(group)2  factor(treat)7:factor(group)2
>>>                      0.331548                             NA
>>>
>>>
>>> I guess this is due to model matrix being singular or collinearity among
>>> the matrix columns? But I can't figure out how the matrix is singular in
>>> this case? Can someone show me why this is the case?
>>
>> Because you have no cases in one of the crossed categories.
>>
>> --David Winsemius, MD
>> West Hartford, CT
>>        [[alternative HTML version deleted]]
>>
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
>
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] why NA coefficients

Reply via email to