Thanks for all the answers, they were really helpful. I noticed the inflation of the variance, and that seems to be the problem indeed (thank you, Thomas). When looking at the tables again (thank you Ted), I noticed that this is actually a classical case of (quasi-)complete separation. Hence the inflation of the variance.
As I cannot redo the experiment (it's not my own data, I'm just the statistician), I had to rethink my approach. Although I violate a lot of assumptions that way, I decided to model the probability of finding more spots in a linear approach (using glm to carry out an ANOVA). Instead of using the F-test, I chose the likelihood ratio test for significance of my terms, as I believe that one is more robust to violations of the assumptions a linear glm does. I checked both the predictions and the residual variance. The predictions were highly accurate, and the residual variance was close to homogenous. I had no reason to believe the model behaved badly on this dataset. Doing this, my variances were within reasonable bounds, and all terms, including the interaction term, are highly significant. A TukeyHSD as a post test confirmed the strong difference in the table of the FLC construct. As the ANOVA and TukeyHSD tend to lose power when the assumptions are violated, I believe it's safe to trust a highly significant result, even when the approach is not optimal. Thank you again for the help. Kind regards Joris On Sat, Mar 7, 2009 at 3:20 PM, Ted Harding <ted.hard...@manchester.ac.uk>wrote: > On 07-Mar-09 10:57:17, Thomas Lumley wrote: > > On Fri, 6 Mar 2009, joris meys wrote: > >> Dear all, > >> I have a dataset where the interaction is more than obvious, > >> but I was asked to give a p-value, so I ran a logistic regression > >> using glm. Very funny, in the outcome the interaction term is NOT > >> significant, although that's completely counterintuitive. There > >> are 3 variables : spot (binary response), constr (gene construct) > >> and vernalized (growth conditions). Only for the FLC construct > >> after vernalization, the chance on spots should be lower. So in > >> the model one would suspect the interaction term to be significant. > >> > >> Yet, only the two main terms are significant here. Can it be my > >> data is too sparse to use these models? Am I using the wrong method? > > > > The point estimate for the interaction term is large: 1.79, or an > > odds ratio of nearly 6. > > > > The data are very strongly overdispersed (variance is 45 times larger > > than it should be), so they don't fit a binomial model well. If you > > used a quasibinomial model you would get no statistical significance > > for any of the terms. > > > > I would say the problem is partly combination of the overdispersion and > > the sample size. It doesn't help that the situation appears to be a > > difference between the FLC:yes cell and the other three cells, a > > difference that is spread out over the three parameters. > > -thomas > > The following way of looking at it may be helpful. Display the data > as two 2x2 tables (one for each level of 'constr'): > > Spot Spot > constr="FLC" 1 0 constr="free" 1 0 > --------------+-------+--- --------------+-------+--- > Vern = "yes": |20 27| 47 Vern = "yes": 42 3| 45 > | | | | > Vern = "no" : |42 3| 45 Vern = "no" : |44 1| 45 > --------------+-------+--- --------------+-------+--- > |62 30| 92 |86 4| 90 > > It seems clear that, in the constr="free" table, there is a close > approximation to no information about the relationship between > 'vernalized' and 'spot'. Given the margins, even the most extreme > possible tables (by col: (45,41)/(0,4) and (41,45)/4,0)) have > probabilities 0.058 of occurring. Other possibilities give > probabilities 0.250, 0.384,0.250. > > On the other hand, the constr="FLC" table shows a very marked > association between 'vernalized' and 'spot'. > > But, given that there is not much information on the "free" table, > you are not going to find an interaction between 'constr' and > 'vernalized'. (You could try out the glm() for each of the possible > "free" tables, given the margins). > > So, in my view, the aetiology of the symptoms is hypospotification > in the "free" lifestyle ... Treatment: Increase your intake of > "free"! Then you may get enough information about association in > that case. > > Ted. > > > >> # data generation > >> testdata <- > >> matrix(c(rep(0:1,times=4),rep(c("FLC","FLC","free","free"),times=2), > >> rep(c("no","yes"),each =4),3,42,1,44,27,20,3,42),ncol=4) > >> colnames(testdata) <-c("spot","constr","vernalized","Freq") > >> testdata <- as.data.frame(testdata) > >> > >> # model > >> T0fit <- glm(spot~constr*vernalized, weights=Freq, data=testdata, > >> family="binomial") > >> anova(T0fit) > >> > >> Kind regards > >> Joris > >> > >> [[alternative HTML version deleted]] > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > Thomas Lumley Assoc. Professor, Biostatistics > > tlum...@u.washington.edu University of Washington, Seattle > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -------------------------------------------------------------------- > E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk> > Fax-to-email: +44 (0)870 094 0861 > Date: 07-Mar-09 Time: 14:14:06 > ------------------------------ XFMail ------------------------------ > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.