Re: [R] Help With ANOVA (corrected please ignore last email)

Dennis Murphy Wed, 07 Jul 2010 04:40:06 -0700

Hi:

I'd suggest looking at the following plot (data in original post, copied
below):


library(lattice)
stripplot(Intensity ~ Group, data = zzzanova)

Some things stand out in this plot that merit attention.

As Josh Wiley pointed out in an earlier reply, the concentration of -4.60517
values
in this data set should be a concern. Of your 54 responses, 26 of them have
this value, spread across all six groups - in addition, this value is
concentrated
in the first few groups and becomes rarer in the later groups. Consider:

> with(zzzanova, table(Group))
Group
Group1 Group2 Group3 Group4 Group5 Group6
     9      8      9     10      9      9
> with(subset(zzzanova, Intensity < -1), table(Intensity, Group))
          Group
Intensity  Group1 Group2 Group3 Group4 Group5 Group6
  -4.60517      8      5      4      4      4      1

Also note that there is one other value with a negative intensity in group 6
(-0.024).
(Side note: Is negative intensity meaningful in the context of your
scientific problem?)

I'm curious about what this -4.60 value is supposed to represent. Is it a
missing
value code, as Josh inferred, a left-censoring value, or something else? The
reason
for asking is that its purpose could well have an impact on the type of
analysis
that is appropriate for these data:

    *  if -4.60 is meant to be a missing value code, its inclusion greatly
inflates the
       degrees of freedom actually present in the data. Moreover, its
presence also
       inflates the variability both within and between groups, which
reduces the
       sensitivity of tests. Also observe from the strip plot above that if
-4.60 is a
       missing value code, then your non-missing data appear to increase in
variance
       with increasing group numbers; moreover, the imbalance in
observations between
       groups is more severe, which in turn makes the p-values of tests less
reliable
       for the non-missing data. (Even worse, the imbalance and increasing
variance
       don't appear to be independent if -4.60 represents a missing value.)
    *  if -4.60 is meant to be something like a lower detection limit or
upper bound
       on a left-censored response, then one is artificially reducing
variability within
       groups and the p-values of tests turn out to be optimistic. There are
better ways
       to handle nondetects. One approach is outlined in Helsel's book
'Nondetects and
       Data Analysis', which is the reference of the R package NADA. Other
approaches
       to nondetect data are also available outside of R (e.g., the SCOUT
software at EPA).
   *   if -4.60 is meant to represent a lower bound on a (censored?)
response,
       then setting -4.60 as the response artificially increases the
variation in the
       data. In this case, one would be artificially left-truncating the
data, but for a
       different reason from that given in the immediately preceding point.

Josh also noted the difference in p-values of the between group test when
the groups
were factors as opposed to numeric. This is a common 'gotcha' in R - you
need to pay attention to
the classes of your inputs when fitting any statistical model. In the case
where the
groups comprise a factor, R performs a one-way ANOVA. [FWIW, I got the same
p-value as Josh for your 'complete' data (54 obs.)]. When the group variable
is numeric,
you are fitting a simple linear regression analysis, which implies that the
numeric
values of the groups are meaningful. There is a big difference in the
interpretation of
the two types of models.

HTH,
Dennis


On Tue, Jul 6, 2010 at 6:12 AM, Amit Patel <amitrh...@yahoo.co.uk> wrote:

> Sorry i had a misprint in the appendix code in the last email
>
>
> Hi I needed some help with ANOVA
>
> I have a problem with My ANOVA
> analysis. I have a dataset with a known ANOVA p-value, however I can
> not seem to re-create it in R.
>
> I have created a list (zzzanova) which contains
> 1)Intensity Values
> 2)Group Number (6 Different Groups)
> 3)Sample Number (54 different samples)
> this is created by the script in Appendix 1
>
> I then conduct ANOVA with the command
> > zzz.aov <- aov(Intensity ~ Group, data = zzzanova)
>
> I get a p-value of
> Pr(>F)1
> 0.9483218
>
> The
> expected p-value is 0.00490 so I feel I maybe using ANOVA incorrectly
> or have put in a wrong formula. I am trying to do an ANOVA analysis
> across all 6 Groups. Is there something wrong with my formula. But I think
> I
> have made a mistake in the formula rather than anything else.
>
>
>
>
> APPENDIX 1
>
> datalist <- c(-4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517,
> -4.60517, 3.003749, -4.60517,
>    2.045314, 2.482557, -4.60517, -4.60517, -4.60517, -4.60517, 1.592743,
> -4.60517,
>    -4.60517, 0.91328, -4.60517, -4.60517, 1.827744, 2.457795, 0.355075,
> -4.60517, 2.39127,
>    2.016987, 2.319903, 1.146683, -4.60517, -4.60517, -4.60517, 1.846162,
> -4.60517, 2.121427, 1.973118,
>    -4.60517, 2.251568, -4.60517, 2.270724, 0.70338, 0.963816, -4.60517,
>  0.023703, -4.60517,
>    2.043382, 1.070586, 2.768289, 1.085169, 0.959334, -0.02428, -4.60517,
> 1.371895, 1.533227)
>
> "zzzanova" <-
> structure(list(Intensity = datalist,
> Group = structure(c(1,1,1,1,1,1,1,1,1,
>         2,2,2,2,2,2,2,2,
>         3,3,3,3,3,3,3,3,3,
>         4,4,4,4,4,4,4,4,4,4,
>         5,5,5,5,5,5,5,5,5,
>         6,6,6,6,6,6,6,6,6), .Label = c("Group1", "Group2", "Group3",
> "Group4", "Group5", "Group6"), class = "factor"),
>    Sample = structure(c( 1, 2, 3, 4, 5, 6, 7, 8, 9,
>    10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
>    20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
>    30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
> 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54)
> ))
> , .Names = c("Intensity",
> "Group", "Sample"), row.names =
> c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10",
> "11", "12", "13", "14", "15", "16", "17", "18", "19", "20",
> "21", "22", "23", "24", "25", "26", "27", "28", "29", "30",
> "31", "32", "33", "34", "35", "36", "37", "38", "39", "40",
> "41", "42", "43", "44", "45", "46", "47", "48", "49", "50",
> "51", "52", "53", "54"),class = "data.frame")
>
>
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help With ANOVA (corrected please ignore last email)

Reply via email to