Hi, I have a data set with 999 observations, for each of them I have data on four variables: site, colony, gender (quite a few NA values), and cohort.
This is how the data set looks like: > str(dispersal) 'data.frame': 999 obs. of 4 variables: $ site : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 2 2 ... $ gender: Factor w/ 2 levels "0","1": NA NA 2 1 2 NA 1 2 2 NA ... $ colony: Factor w/ 2 levels "main","other": 2 2 2 2 2 2 2 2 2 2 ... $ cohort: Factor w/ 11 levels "1996","2000",..: 8 8 8 8 8 8 8 8 8 6 ... Now, I want to estimate if sites 1 and site 2 differ on some of the other variables. For instance there are relatively more males in site 1 with respect to site 2, more individuals of the main colony in site 2 with respect to site 1 and this sort of things. I thought I might do a binomial GLM considering as response variable the site, I tried to run the more general model to have a look to overdispersion but I believe there is something wrong even before worrying about overdispersion. I know (I did a chisq.test) that cohort2004 is very diversly represented between the two sites but it is not reflected in the results of GLM. Here there are the results of chisq.test and the GLM: 1) /> age_cohort<-as.table(rbind(c(142,95,46,33,14,59,18,12,7,1,0),c(258,144,54,70,20,11,6,8,2,3,1))) > dimnames(age_cohort)<-list(site=c("M","D"), + cohorts=c(2010,2009,2008,2007,2006,2004,2003,2002,2001,2000,1996)) > age_cohort cohorts site 2010 2009 2008 2007 2006 2004 2003 2002 2001 2000 1996 M 142 95 46 33 14 59 18 12 7 1 0 D 258 144 54 70 20 11 6 8 2 3 1 > (Xsqagec <- chisq.test(age_cohort)) # Prints test summary Pearson's Chi-squared test data: age_cohort X-squared = 82.6016, df = 10, p-value = 1.549e-13 Mensajes de aviso perdidos In chisq.test(age_cohort) : Chi-squared approximation may be incorrect > Xsqagec$observed # observed counts cohorts site 2010 2009 2008 2007 2006 2004 2003 2002 2001 2000 1996 M 142 95 46 33 14 59 18 12 7 1 0 D 258 144 54 70 20 11 6 8 2 3 1 > Xsqagec$expected # expected counts under the null cohorts site 2010 2009 2008 2007 2006 2004 2003 M 170.1195 101.6464 42.52988 43.80578 14.46016 29.77092 10.20717 D 229.8805 137.3536 57.47012 59.19422 19.53984 40.22908 13.79283 cohorts site 2002 2001 2000 1996 M 8.505976 3.827689 1.701195 0.4252988 D 11.494024 5.172311 2.298805 0.5747012 > Xsqagec$residuals # Pearson residuals cohorts site 2010 2009 2008 2007 2006 M -2.1559111 -0.6592367 0.5321050 -1.6326395 -0.1210101 D 1.8546283 0.5671101 -0.4577448 1.4044825 0.1040993 cohorts site 2004 2003 2002 2001 2000 M 5.3569686 2.4391720 1.1980192 1.6214643 -0.5376032 D -4.6083465 -2.0983042 -1.0305993 -1.3948690 0.4624746 cohorts site 1996 M -0.6521494 D 0.5610132 > Xsqagec$stdres # standardized residuals cohorts site 2010 2009 2008 2007 2006 M -3.6665549 -0.9962228 0.7397057 -2.2733888 -0.1623984 D 3.6665549 0.9962228 -0.7397057 2.2733888 0.1623984 cohorts site 2004 2003 2002 2001 2000 M 7.3264142 3.2566808 1.5962909 2.1485311 -0.7105713 D -7.3264142 -3.2566808 -1.5962909 -2.1485311 0.7105713 cohorts site 1996 M -0.8606814 D 0.8606814 / 2) /> model1<-glm(site~gender*colony*cohort,binomial) > summary(model1) Call: glm(formula = site ~ gender * colony * cohort, family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -1.84648 -0.96954 -0.00036 1.11269 2.03933 Coefficients: (12 not defined because of singularities) Estimate Std. Error z value (Intercept) -1.657e+01 2.400e+03 -0.007 gender1 -2.231e-01 9.220e-01 -0.242 colonyother 9.531e-02 8.006e-01 0.119 cohort2002 1.717e-08 3.393e+03 0.000 cohort2003 1.766e+01 2.400e+03 0.007 cohort2004 1.807e+01 2.400e+03 0.008 cohort2006 1.697e+01 2.400e+03 0.007 cohort2007 1.726e+01 2.400e+03 0.007 cohort2008 1.606e+01 2.400e+03 0.007 cohort2009 1.657e+01 2.400e+03 0.007 cohort2010 1.587e+01 2.400e+03 0.007 gender1:colonyother 9.163e-01 1.087e+00 0.843 gender1:cohort2002 1.719e+01 2.400e+03 0.007 gender1:cohort2003 -1.823e-01 1.713e+00 -0.106 gender1:cohort2004 2.231e-01 1.329e+00 0.168 gender1:cohort2006 -5.878e-01 1.586e+00 -0.371 gender1:cohort2007 -6.454e-02 1.784e+00 -0.036 gender1:cohort2008 8.881e-01 1.156e+00 0.768 gender1:cohort2009 -2.817e-02 1.199e+00 -0.023 gender1:cohort2010 NA NA NA colonyother:cohort2002 NA NA NA colonyother:cohort2003 NA NA NA colonyother:cohort2004 1.497e+01 1.697e+03 0.009 colonyother:cohort2006 -1.707e+01 2.400e+03 -0.007 colonyother:cohort2007 -7.885e-01 1.772e+00 -0.445 colonyother:cohort2008 NA NA NA colonyother:cohort2009 NA NA NA colonyother:cohort2010 NA NA NA gender1:colonyother:cohort2002 NA NA NA gender1:colonyother:cohort2003 NA NA NA gender1:colonyother:cohort2004 -9.163e-01 2.400e+03 0.000 gender1:colonyother:cohort2006 NA NA NA gender1:colonyother:cohort2007 -2.575e+00 2.379e+00 -1.082 gender1:colonyother:cohort2008 NA NA NA gender1:colonyother:cohort2009 NA NA NA gender1:colonyother:cohort2010 NA NA NA Pr(>|z|) (Intercept) 0.994 gender1 0.809 colonyother 0.905 cohort2002 1.000 cohort2003 0.994 cohort2004 0.994 cohort2006 0.994 cohort2007 0.994 cohort2008 0.995 cohort2009 0.994 cohort2010 0.995 gender1:colonyother 0.399 gender1:cohort2002 0.994 gender1:cohort2003 0.915 gender1:cohort2004 0.867 gender1:cohort2006 0.711 gender1:cohort2007 0.971 gender1:cohort2008 0.442 gender1:cohort2009 0.981 gender1:cohort2010 NA colonyother:cohort2002 NA colonyother:cohort2003 NA colonyother:cohort2004 0.993 colonyother:cohort2006 0.994 colonyother:cohort2007 0.656 colonyother:cohort2008 NA colonyother:cohort2009 NA colonyother:cohort2010 NA gender1:colonyother:cohort2002 NA gender1:colonyother:cohort2003 NA gender1:colonyother:cohort2004 1.000 gender1:colonyother:cohort2006 NA gender1:colonyother:cohort2007 0.279 gender1:colonyother:cohort2008 NA gender1:colonyother:cohort2009 NA gender1:colonyother:cohort2010 NA (Dispersion parameter for binomial family taken to be 1) Null deviance: 311.91 on 224 degrees of freedom Residual deviance: 271.61 on 201 degrees of freedom (774 observations deleted due to missingness) AIC: 319.61 Number of Fisher Scoring iterations: 15 / I thought that perhaps keeping the gender as explanatory variable was reducing a lot the sample size and it was the matter (I removed it): /> model1<-glm(site~colony*cohort,binomial) > summary(model1) Call: glm(formula = site ~ colony * cohort, family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -1.8683 -0.9712 -0.8757 1.3468 1.7470 Coefficients: (6 not defined because of singularities) Estimate Std. Error z value Pr(>|z|) (Intercept) -15.5661 1455.3976 -0.011 0.991 colonyother 0.2544 0.2167 1.174 0.240 cohort2000 14.4675 1455.3981 0.010 0.992 cohort2001 16.8188 1455.3978 0.012 0.991 cohort2002 15.9715 1455.3977 0.011 0.991 cohort2003 16.6647 1455.3977 0.011 0.991 cohort2004 17.1194 1455.3976 0.012 0.991 cohort2006 15.2607 1455.3976 0.010 0.992 cohort2007 14.9470 1455.3976 0.010 0.992 cohort2008 15.3837 1455.3976 0.011 0.992 cohort2009 15.1762 1455.3976 0.010 0.992 cohort2010 14.8053 1455.3976 0.010 0.992 colonyother:cohort2000 NA NA NA NA colonyother:cohort2001 NA NA NA NA colonyother:cohort2002 NA NA NA NA colonyother:cohort2003 NA NA NA NA colonyother:cohort2004 13.7583 550.0887 0.025 0.980 colonyother:cohort2006 -15.5151 1455.3976 -0.011 0.991 colonyother:cohort2007 -0.9163 0.5979 -1.533 0.125 colonyother:cohort2008 NA NA NA NA colonyother:cohort2009 -0.6912 0.5214 -1.326 0.185 colonyother:cohort2010 NA NA NA NA (Dispersion parameter for binomial family taken to be 1) Null deviance: 1361.4 on 998 degrees of freedom Residual deviance: 1267.9 on 983 degrees of freedom AIC: 1299.9 Number of Fisher Scoring iterations: 14 / Any comment/suggestion on this? Thanks for any help -- View this message in context: http://r.789695.n4.nabble.com/Binomial-GLM-chisq-test-or-tp4608941.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.