cct663 <cct663 <at> gmail.com> writes: > I have a few questions relating to overdispersion in a sex ratio data set > that I am working with (note that I already have an analysis with GLMMs for > fixed effects, this is just to estimate dispersion). The response variable > is binomial because nestlings can only be male or female. I have samples of > 1-5 nestlings from each nest (individuals within a nest are not independent, > so the response variable is the ratio of sons to daughters) and some females > have multiple nests in the data set (so I need to include female identity as > a random effect). > > Here is an example of what the three vectors used in the model look like > (the real data set is much bigger, just to illustrate what I’m talking > about): > > male_chick_no=c(2,4,1,0,3,5,2) > female_chick_no=c(1,0,3,3,1,0,2) > FemaleID=c(A,A,B,B,C,D,E) > > My first question relates to coding the test in R. I received this suggested > R syntax from a reviewer: > > SexRatio = cbind(male_chick_no, female_chick_no) > > Model <- lmer(SexRatio ~ 1 +(1|FemaleID), family = quasibinomial) > > But when I try to use this in R I get the error: “Error in glmer(formula = > ratio ~ 1 + (1 | femid), family = quasibinomial) : "quasi" families cannot > be used in glmer”.
After many discussions of the unreliability of quasi-likelihood estimation in lme4, Doug Bates added this error/warning in a fairly recent version. The recommended approach now is to use individual-level random effects: to continue your example above, indiv <- 1:length(FemaleID) Model <- lmer(SexRatio ~ 1 + (1|indiv) + (1|FemaleID), family=binomial) By the way, it is generally better practice to put all of your data into a dataframe and use the data= argument to lmer. > My second question is more general: I understand that with binomial data > overdispersion suggests that the observed data have a greater variance than > expected given binomial errors (in my case this means that more nests would > be all male/all female than expected if sex is random). So with binomial > errors the expected estimate of dispersion is 1, if I find that dispersion > is > 1 it suggests that my data are overdispersed. My question is, how much > greater than 1 should that number be to conclude that the data are > overdispersed? Is there a rule of thumb or does it just depend on the > dataset? Very generally/crudely, the (squared) Pearson residuals are expected to be chi-square distributed. Specifically, if you know the residual degrees of freedom (which can be a bit tricky/poorly defined for mixed models, but is approximately (# data points)-(# estimated parameters), then sum(residuals^2) should be chi-squared distributed with df equal to the residual degrees of freedom. Venables and Ripley have a useful discussion. > I was thinking of doing a randomization test with the same structure (nest > size and female id) as my real data set but with sex ratio of each nest > randomized with a 50:50 chance of individuals being sons or daughters and > comparing my observed dispersion to the distribution of dispersions from the > randomization test. Would this be a valid way to ask whether my data are > overdispersed? Is it even necessary? It seems reasonable. You could go even farther and simulate data from your estimated model with binomial errors (i.e. use the estimated sex ratios rather than assuming a 50/50 sex ratio). Followups should probably be sent to the r-sig-mixed-models list. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.