Hi all, I've spent quite a lot of time searching through the help lists and reading about how best to run perform a 2-way ANOVA with unbalanced data. I realize this has been covered a great deal so I was trying to avoid adding yet another entry to the long list considering the use of different SS, etc. Unfortunately, I have come to the point where I feel I have to wade in and see if someone can help me out. Hopefully I'll phrase this properly given and hopefully it will end up only requiring a simple response.
I have an experiment where I have measured a response variable (such as water content) following exposure to two treatments ("oxygen content" and "medium"). Oxygen content has three levels (5, 20, 35) and medium has two levels (Air, Water). I am interested if water content is different under the two treatments and whether the effect of oxygen content depends upon the medium in which the experiment was conducted (Air or Water). Unfortunately, the design is unbalanced as some experimental subjects had to be removed from the experiment. I realize that if I just use aov() to perform a two-way ANOVA the order in which the terms ("oxygen content" and "medium") are entered will give different results because of the sequential SS. What I have done in the past is utilize drop1() in conjunction with aov() drop1(aov(WaterContent~Oxygen*Medium, data), test="F") to see if the interaction term was significant (F, p-value) and if its inclusion improved model fit (AIC). If from this I determine that the interaction term can be removed and the model can be rerun without it, I am able to test for main-effects and get F and p-values that I can report in a manuscript. However, if the interaction term is significant and its inclusion is warranted, drop1() only provide me with SS, F, and p-value for the interaction term. Now this is fine, because I do not wish to interpret the main-effects with a significant interaction, but in a manuscript reviewers will request an "ANOVA table" where l will be asked to report SS, F and p-values for the other terms. I don't have those because I used drop1() which only provides these for the highest order term in the model. How best should I calculate the values that I know I will be asked to provide in a manuscript? I don't wish to come across as a scientist who is simply a slave to the F and p-values with little regard for the data, the hypotheses, and the actual statistical interpretation. I am interested in doing this "right", but I also know that practically in the current status of our field, while I focus on doing statistics that address my hypotheses of interest and can choose to not discuss the main effects in isolation when an interaction exists, I will be asked to provide the "ANOVA table" with all the degrees of freedom, SS, F-values, p-values etc...for the entire model, not just the highest order term. Can anyone provide advice here? Should I just use the car package and Type III SS with an appropriate contrast and not use the drop1() function, even though I'm really not interested in using the Type III SS and I kinda like the drop1()? I am not opposed to Type II SS, but clearly if the interaction is important then using Type II SS, which do not consider interactions, are not appropriate. Hopefully this is somewhat clear and doesn't simply sound like a rehashing of the same old "ANOVA and SS" story. Maybe I should be doing something completely different I greatly appreciate constructive comments. Thanks, Nate [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.