Thanks for all the answers, they were really helpful. I noticed the
inflation of the variance, and that seems to be the problem indeed (thank
you, Thomas). When looking at the tables again (thank you Ted), I noticed
that this is actually a classical case of (quasi-)complete separation. Hence
the inflation of the variance.

As I cannot redo the experiment (it's not my own data, I'm just the
statistician), I had to rethink my approach. Although I violate a lot of
assumptions that way, I decided to model the probability of finding more
spots in a linear approach (using glm to carry out an ANOVA). Instead of
using the F-test, I chose the likelihood ratio test for significance of my
terms, as I believe that one is more robust to violations of the assumptions
a linear glm does. I checked both the predictions and the residual variance.
The predictions were highly accurate, and the residual variance was close to
homogenous. I had no reason to believe the model behaved badly on this
dataset.

Doing this, my variances were within reasonable bounds, and all terms,
including the interaction term, are highly significant. A TukeyHSD as a post
test confirmed the strong difference in the table of the FLC construct. As
the ANOVA and TukeyHSD tend to lose power when the assumptions are violated,
I believe it's safe to trust a highly significant result, even when the
approach is not optimal.

Thank you again for the help.
Kind regards
Joris

On Sat, Mar 7, 2009 at 3:20 PM, Ted Harding <ted.hard...@manchester.ac.uk>wrote:

> On 07-Mar-09 10:57:17, Thomas Lumley wrote:
> > On Fri, 6 Mar 2009, joris meys wrote:
> >> Dear all,
> >> I have a dataset where the interaction is more than obvious,
> >> but I was asked to give a p-value, so I ran a logistic regression
> >> using glm. Very funny, in the outcome the interaction term is NOT
> >> significant, although that's completely counterintuitive. There
> >> are 3 variables : spot (binary response), constr (gene construct)
> >> and vernalized (growth conditions). Only for the FLC construct
> >> after vernalization, the chance on spots should be lower. So in
> >> the model one would suspect the interaction term to be significant.
> >>
> >> Yet, only the two main terms are significant here. Can it be my
> >> data is too sparse to use these models? Am I using the wrong method?
> >
> > The point estimate for the interaction term is large: 1.79, or an
> > odds ratio of nearly 6.
> >
> > The data are very strongly overdispersed (variance is 45 times larger
> > than it should be), so they don't fit a binomial model well. If you
> > used a quasibinomial model you would get no statistical significance
> > for any of the terms.
> >
> > I would say the problem is partly combination of the overdispersion and
> > the sample size.  It doesn't help that the situation appears to be a
> > difference between the FLC:yes cell and the other three cells, a
> > difference that is spread out over the three parameters.
> >       -thomas
>
> The following way of looking at it may be helpful. Display the data
> as two 2x2 tables (one for each level of 'constr'):
>
>                 Spot                            Spot
> constr="FLC"    1    0          constr="free"   1    0
> --------------+-------+---      --------------+-------+---
> Vern = "yes": |20   27| 47      Vern = "yes":  42    3| 45
>              |       |                       |       |
> Vern = "no" : |42    3| 45      Vern = "no" : |44    1| 45
> --------------+-------+---      --------------+-------+---
>              |62   30| 92                    |86    4| 90
>
> It seems clear that, in the constr="free" table, there is a close
> approximation to no information about the relationship between
> 'vernalized' and 'spot'. Given the margins, even the most extreme
> possible tables (by col: (45,41)/(0,4) and (41,45)/4,0)) have
> probabilities 0.058 of occurring. Other possibilities give
> probabilities 0.250, 0.384,0.250.
>
> On the other hand, the constr="FLC" table shows a very marked
> association between 'vernalized' and 'spot'.
>
> But, given that there is not much information on the "free" table,
> you are not going to find an interaction between 'constr' and
> 'vernalized'. (You could try out the glm() for each of the possible
> "free" tables, given the margins).
>
> So, in my view, the aetiology of the symptoms is hypospotification
> in the "free" lifestyle ... Treatment: Increase your intake of
> "free"! Then you may get enough information about association in
> that case.
>
> Ted.
>
>
> >> # data generation
> >> testdata <-
> >> matrix(c(rep(0:1,times=4),rep(c("FLC","FLC","free","free"),times=2),
> >>  rep(c("no","yes"),each =4),3,42,1,44,27,20,3,42),ncol=4)
> >> colnames(testdata) <-c("spot","constr","vernalized","Freq")
> >> testdata <- as.data.frame(testdata)
> >>
> >> # model
> >> T0fit <- glm(spot~constr*vernalized, weights=Freq, data=testdata,
> >> family="binomial")
> >> anova(T0fit)
> >>
> >> Kind regards
> >> Joris
> >>
> >>      [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> > Thomas Lumley                 Assoc. Professor, Biostatistics
> > tlum...@u.washington.edu      University of Washington, Seattle
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 07-Mar-09                                       Time: 14:14:06
> ------------------------------ XFMail ------------------------------
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to