Hi: I'd suggest looking at the following plot (data in original post, copied below):
library(lattice) stripplot(Intensity ~ Group, data = zzzanova) Some things stand out in this plot that merit attention. As Josh Wiley pointed out in an earlier reply, the concentration of -4.60517 values in this data set should be a concern. Of your 54 responses, 26 of them have this value, spread across all six groups - in addition, this value is concentrated in the first few groups and becomes rarer in the later groups. Consider: > with(zzzanova, table(Group)) Group Group1 Group2 Group3 Group4 Group5 Group6 9 8 9 10 9 9 > with(subset(zzzanova, Intensity < -1), table(Intensity, Group)) Group Intensity Group1 Group2 Group3 Group4 Group5 Group6 -4.60517 8 5 4 4 4 1 Also note that there is one other value with a negative intensity in group 6 (-0.024). (Side note: Is negative intensity meaningful in the context of your scientific problem?) I'm curious about what this -4.60 value is supposed to represent. Is it a missing value code, as Josh inferred, a left-censoring value, or something else? The reason for asking is that its purpose could well have an impact on the type of analysis that is appropriate for these data: * if -4.60 is meant to be a missing value code, its inclusion greatly inflates the degrees of freedom actually present in the data. Moreover, its presence also inflates the variability both within and between groups, which reduces the sensitivity of tests. Also observe from the strip plot above that if -4.60 is a missing value code, then your non-missing data appear to increase in variance with increasing group numbers; moreover, the imbalance in observations between groups is more severe, which in turn makes the p-values of tests less reliable for the non-missing data. (Even worse, the imbalance and increasing variance don't appear to be independent if -4.60 represents a missing value.) * if -4.60 is meant to be something like a lower detection limit or upper bound on a left-censored response, then one is artificially reducing variability within groups and the p-values of tests turn out to be optimistic. There are better ways to handle nondetects. One approach is outlined in Helsel's book 'Nondetects and Data Analysis', which is the reference of the R package NADA. Other approaches to nondetect data are also available outside of R (e.g., the SCOUT software at EPA). * if -4.60 is meant to represent a lower bound on a (censored?) response, then setting -4.60 as the response artificially increases the variation in the data. In this case, one would be artificially left-truncating the data, but for a different reason from that given in the immediately preceding point. Josh also noted the difference in p-values of the between group test when the groups were factors as opposed to numeric. This is a common 'gotcha' in R - you need to pay attention to the classes of your inputs when fitting any statistical model. In the case where the groups comprise a factor, R performs a one-way ANOVA. [FWIW, I got the same p-value as Josh for your 'complete' data (54 obs.)]. When the group variable is numeric, you are fitting a simple linear regression analysis, which implies that the numeric values of the groups are meaningful. There is a big difference in the interpretation of the two types of models. HTH, Dennis On Tue, Jul 6, 2010 at 6:12 AM, Amit Patel <amitrh...@yahoo.co.uk> wrote: > Sorry i had a misprint in the appendix code in the last email > > > Hi I needed some help with ANOVA > > I have a problem with My ANOVA > analysis. I have a dataset with a known ANOVA p-value, however I can > not seem to re-create it in R. > > I have created a list (zzzanova) which contains > 1)Intensity Values > 2)Group Number (6 Different Groups) > 3)Sample Number (54 different samples) > this is created by the script in Appendix 1 > > I then conduct ANOVA with the command > > zzz.aov <- aov(Intensity ~ Group, data = zzzanova) > > I get a p-value of > Pr(>F)1 > 0.9483218 > > The > expected p-value is 0.00490 so I feel I maybe using ANOVA incorrectly > or have put in a wrong formula. I am trying to do an ANOVA analysis > across all 6 Groups. Is there something wrong with my formula. But I think > I > have made a mistake in the formula rather than anything else. > > > > > APPENDIX 1 > > datalist <- c(-4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, > -4.60517, 3.003749, -4.60517, > 2.045314, 2.482557, -4.60517, -4.60517, -4.60517, -4.60517, 1.592743, > -4.60517, > -4.60517, 0.91328, -4.60517, -4.60517, 1.827744, 2.457795, 0.355075, > -4.60517, 2.39127, > 2.016987, 2.319903, 1.146683, -4.60517, -4.60517, -4.60517, 1.846162, > -4.60517, 2.121427, 1.973118, > -4.60517, 2.251568, -4.60517, 2.270724, 0.70338, 0.963816, -4.60517, > 0.023703, -4.60517, > 2.043382, 1.070586, 2.768289, 1.085169, 0.959334, -0.02428, -4.60517, > 1.371895, 1.533227) > > "zzzanova" <- > structure(list(Intensity = datalist, > Group = structure(c(1,1,1,1,1,1,1,1,1, > 2,2,2,2,2,2,2,2, > 3,3,3,3,3,3,3,3,3, > 4,4,4,4,4,4,4,4,4,4, > 5,5,5,5,5,5,5,5,5, > 6,6,6,6,6,6,6,6,6), .Label = c("Group1", "Group2", "Group3", > "Group4", "Group5", "Group6"), class = "factor"), > Sample = structure(c( 1, 2, 3, 4, 5, 6, 7, 8, 9, > 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, > 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, > 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, > 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54) > )) > , .Names = c("Intensity", > "Group", "Sample"), row.names = > c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", > "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", > "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", > "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", > "41", "42", "43", "44", "45", "46", "47", "48", "49", "50", > "51", "52", "53", "54"),class = "data.frame") > > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.