On 13/05/2008, at 4:09 AM, Douglas Bates wrote:

I'm entering this discussion late so I may be discussing issues that
have already been addressed.

As I understand it, Federico, you began by describing a model for data
in which two factors have a fixed set of levels and one factor has an
extensible, or "random", set of levels and you wanted to fit a model
that you described as

y ~ effect1 * effect2 * effect3

The problem is that this specification is not complete.

        At *last* (as Owl said to Rabbit) we're getting somewhere!!!

        I always knew that there was some basic fundamental point
        about this business that I (and I believe many others) were
        simply missing.  But I could not for the life of me get anyone
        to explain to me what that point was.  Or to put it another
        way, I was never able to frame a question that would illuminate
        just what it was that I wasn't getting.

        I now may be at a stage where I can start asking the right
        questions.

An interaction of factors with fixed levels and a factor with random
levels can mean, in the lmer specification,

lmer(y ~ effect1 * effect2 + (1| effect3) + (1| effect1:effect2:effect3), ...)

or

lmer(y ~ effect1 * effect2 + (effect1*effect2 | effect3), ...)

or other variations.  When you specify a random effect or an random
interaction term you must, either explicitly or implicitly, specify
the form of the variance-covariance matrix associated with those
random effects.

The "advantage" that other software may provide for you is that it
chooses the model for you but that, of course, means that you only
have the one choice.

        Now may I start asking what I hope are questions that will lift
        the fog a bit?

        Let us for specificity consider a three-way model with two
        fixed effects and one random effect from the good old Rothamstead style
        agricultural experiment context:  Suppose we have a number of
        species/breeds of wheat (say) and a number of fertilizers.
        These are fixed effects.  And we have a number of fields (blocks)
        --- a random effect.  Each breed-fertilizer combination is
        applied a number of times in each field.  We ***assume*** that
        that the field or block effect is homogeneous throughout.  This
        may or may not be a ``good'' assumption, but it's not completely
        ridiculous and would often be made in practice.  And probably
        *was* made at Rothamstead.  The response would be something like
        yield in bushels per acre.

        The way that I would write the ``full'' model for this setting,
        in mathematical style is:

        Y_ijkl = mu + alpha_i + beta_j + (alpha.beta)_ij + C_k + (alpha.C)_ik
                    + (beta.C)_jk + (alpha.beta.C)_ijk + E_ijkl

The alpha_i and beta_j are parameters corresponding to breed and fertilizer respectively; the C_k are random effects corresponding to fields or blocks.
        Any effect ``involving'' C is also random.

The assumptions made by the Package-Which-Must-Not-Be-Named are (I think)
        that

                C_k ~ N(0,sigma_C^2)
                (alpha.C)_ik ~ N(0,sigma_aC^2)
                (beta.C)jk ~ N(0,sigma_bC^2)
                (alpha.beta.C)_ijk ~ N(0,sigma_abC^2)
                E_ijkl ~ N(0,sigma^2)

        and these random variables are *all independent*.

        Ahhhhhhhh ... perhaps I'm on the way to answering my own question.  Is
        it this assumption of ``all independent'' which is questionable?  It
        seemed innocent enough when I first learned about this stuff, lo these
        many years ago.  But .... mmmmmaybe not!

        To start with:  What would be the lmer syntax to fit the foregoing
        (possibly naive) model?  I am sorry, but I really cannot get my head
        around the syntax of lmer model specification, and I've tried.  I
        really have.  Hard.  I know I must be starting from the wrong place,
        but I haven't a clue as to what the *right* place to start from is.
        And if I'm in that boat, I will wager Euros to pretzels that there
        are others in it.  I know that I'm not the brightest bulb in the
        chandelier, but I'm not the dullest either.

        Having got there:  Presuming that I'm more-or-less on the right track
in my foregoing conjecture that it's the over-simple dependence structure that is the problem with what's delivered by the Package-Which-Must- Not-Be-Named, how might one go about being less simple-minded? I.e. what might be some more realistic dependence structures, and how would one specify these in lmer?
        And how would one assess whether the assumed dependence structure gives
        a reasonable fit to the data?

If you can describe how many variance components you think should be
estimated in your model and what they would represent then I think it
will be easier to describe how to fit the model.

        How does this fit in with my conjecture (above) about what I've been
        missing all these years?  Does it fit?  How many variance components
        are there in the ``naive'' model?  It looks like 5 to me ... but maybe
I'm totally out to lunch in what I think I'm understanding at this stage. (And besides --- there are three sorts of statistician; those who can count,
        and those who can't.)

        Thank you for your indulgence.

                 cheers,

                        Rolf Turner

######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to