On Mon, Oct 5, 2009 at 11:57 AM, Bert Gunter <gunter.ber...@gene.com> wrote: [snip] > -- ... and if the correlations are "high" it tells you that your model may > be near unidentifiable = the model parameters may not be effectively > estimated from the data. To understand what "high", "near" and "effectively" > may mean for your data, CYLS ("Consult your local statistician") > > (If you really wish to use sophisticated tools like glmer, you really need > to understand what you're doing. There is no guarantee of immunity from the > consequences of ignorance.)
Indeed. I was simply trying to answer the 'basic' linear models questions and staying away from judgements. However, since you bring it up, I'll go ahead and climb up on a soapbox ;-) \begin{rant} I think it is a problem in ecology (and I'm sure other fields) that there is huge demand for tools allowing inferences about complex systems, yet very few have the skills necessary to safely use the tools provided by statisticians. For example, a common situation for an ecologist is to be faced with analyzing observational data with temporal and spatial non-independence of observations, lack of balance, lack of normality, and often zero inflation and/or under/over-dispersion. Reviewers know enough to understand the problems this presents classical techniques, and therefore use of complex tools (such as mixed models or hierarchical Bayesian models) can become a prerequisite to getting published. In other words, careers depend on using tools that ecologists who spends their time focused on ecology rather than mathematical statistics have little hope of truly understanding. This is certainly no jab at the intelligence of ecologists -- it's just that when you get into areas such as drawing inferences from a GLMM, the proportion of statisticians, even, who understand the subtleties and pitfalls is small, and when you throw in say zero inflation and spatially structured covariance matrices that small proportion dwindles drastically. /end{rant} So, I suppose what I should have done after mentioning the LRT was to provide this list I sent to r-sig-ecology awhile back (with a LMM in mind): - LRTs aren't valid to compare REML fits with different fixed effects because REML essentially maximizes A'Y where E[A'Y] = 0, so changing the fixed effects changes A' which changes the data making the likelihoods non-comparable. - Pinheiro and Bates (2000, pg 87-88) recommend LRTs with the standard X^2 distribution not be used to compare ML fits with different fixed effects because the tests can be very "anticonservative", particularly as the number of parameters being removed becomes large relative to the number of observations. - LRTs for differences in the random part of the model when the fixed effects are the same can be conservative due to the null value of 0 being on the edge of the variance parameter space. - It seems the issue of counting the number of parameters being estimated will be an issue when comparing models that differ in their random effects. best, Kingsford Jones > > -- Bert > > hth, > > Kingsford > > > >> Many thanks for any help. >> >> Cheers, >> Umesh Srinivasan, >> Bangalore, India >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.