[R] multiple comparison correction

Jake Kami Mon, 11 Oct 2010 12:09:07 -0700

dear list,

i just found this post in the archive:


----------------

On 23-Apr-05 Bill.Venables at csiro.au
<https://stat.ethz.ch/mailman/listinfo/r-help> wrote:
>:* -----Original Message-----*>:* From: r-help-bounces at stat.math.ethz.ch 
><https://stat.ethz.ch/mailman/listinfo/r-help> *>:* [mailto:r-help-bounces at 
>stat.math.ethz.ch <https://stat.ethz.ch/mailman/listinfo/r-help>] On Behalf Of 
>*>:* michael watson (IAH-C)*>:* [...]*>:* I have a highly significant 
>interaction term. In the context*>:* of the experiment, this makes sense. I 
>can visualise the data *>:* graphically, and sure enough I can see that both 
>factors have*>:* different effects on the data DEPENDING on what the value 
>of*>:* the other factor is.  *>:* *>:* I explain this all to my colleague - 
>and she asks "but which*>:* ones are different?" This is best illustrated with 
>an example.*>:* We have either infected | uninfected, and vaccinated | 
>unvaccinated*>:* (the two factors).*>:* We're measuring expression of a gene.  
>Graphically, in the*>:* infected group, vaccination makes expression go up. In 
>the*>:* uninfected group, vaccination makes expression go down. In*>:* bo!
 th the vaccinated and unvaccinated groups, infection makes*>:* expression go 
down, but it goes down further in unvaccinated*>:* than it does in 
vaccinated.*>:* *>:* So from a statistical point of view, I can see exactly 
why*>:* the interaction term is significant, but what my colleage*>:* wants to 
know is that WITHIN the vaccinated group, does*>:* infection decrease 
expression significantly? And within*>:* the unvaccinated group, does infection 
decrease expression*>:* significantly?  Etc etc etc  Can I get this information 
from*>:* the output of the ANOVA, or do I carry out a separate*>:* test on e.g. 
just the vaccinated group? (seems a cop out to me)*>* *>* No, you can't get 
this kind of specific information out of the anova*>* table and yes, anova 
tables *are* a bit of a cop out.  (I sometimes *>* think they should only be 
allowed between consenting adults in*>* private.)*
I think the "cop out" Michael Watson was referring to means
going back to basics and doing a separate analysis on each group
(though no doubt using the Res SS from the AoV table).

Not that I disagree with your comment: I sometimes think that
anova tables are often passed round between adults in order to
induce consent which might otherwise have been withheld.

>* What you are asking for is a non-standard, but perfectly*>* reasonable 
>partition of the degrees of freedom between the*>* classes of a single factor 
>with four levels got by pairing*>* up the levels of vaccination and 
>innoculation. Of course you*>* can get this information, but you have to do a 
>bit of work*>* for it.  *
It seems to me that this is a wrapper for "separate analysis
on each group"!

>* Before I give the example which I don't expect too many people*>* to read 
>entirely, let me issue a little challenge, namely to*>* write tools to 
>automate a generalized version of the procedure*>* below.*
[technical setup snipped]

>>* contrasts(dat$vac_inf) <- ginv(m)*>>* gm <- aov(y ~ vac_inf, dat)*>>* 
>>summary(gm)*>*             Df  Sum Sq Mean Sq F value  Pr(>F)*>* vac_inf      
>>3 12.1294  4.0431   7.348 0.04190*>* Residuals    4  2.2009  0.5502*>* *>* 
>>This doesn't tell us too much other than there are differences,*>* probably.  
>>Now to specify the partition:*>*                 *>>* summary(gm, *>*       
>>split = list(vac_inf = list("- vs +|N" = 1, *>*                               
>>    "- vs +|Y" = 2)))*>*                     Df  Sum Sq Mean Sq F value  
>>Pr(>F)*>* vac_inf              3 12.1294  4.0431  7.3480 0.04190*>*   
>>vac_inf: - vs +|N  1  7.9928  7.9928 14.5262 0.01892*>*   vac_inf: - vs +|Y  
>>1  3.7863  3.7863  6.8813 0.05860*>* Residuals            4  2.2009  0.5502*
Wow, Bill! Dazzling. This is like watching a rabbit hop
into a hat, and fly out as a dove. I must study this syntax.
But where can I find out about the "split" argument to "summary"?
I've found the *function* split, whose effect is similar, but
I've wandered around the "summary", "summary.lm" etc. forest
for a while without locating the *argument*.

My naive ("cop-out") approach would have been on the lines
of (without setting up the contrast matrix):

  summary(aov(y~vac*inf,data=dat))
              Df  Sum Sq Mean Sq F value  Pr(>F)
  vac          1  0.3502  0.3502  0.6364 0.46968
  inf          1 11.3908 11.3908 20.7017 0.01042 *
  vac:inf      1  0.3884  0.3884  0.7058 0.44812
  Residuals    4  2.2009  0.5502

so we get the 2.2009 on 4 df SS for redisuals with mean SS 0.5502.

Then I would do:

  mNp<-mean(y[(vac=="N")&(inf=="+")])
  mNm<-mean(y[(vac=="N")&(inf=="-")])
  mYp<-mean(y[(vac=="Y")&(inf=="+")])
  mYm<-mean(y[(vac=="Y")&(inf=="-")])

  c(     mYp,       mYm,       mNp,      mNm       )
  ##[1]  2.4990492  0.5532018  2.5212655 -0.3058972

  c(    mYp-mYm,   mNp-mNm )
  ##[1] 1.945847   2.827163


after which:

  1-pt(((mYp-mYm)/sqrt(0.5502)),4)
  ##[1] 0.02929801

  1-pt(((mNp-mNm)/sqrt(0.5502)),4)
  ##[1] 0.009458266

give you 1-sided t-tests, and

  1-pf(((mYp-mYm)/sqrt(0.5502))^2,1,4)
  ##[1] 0.05859602

  1-pf(((mNp-mNm)/sqrt(0.5502))^2,1,4)
  ##[1] 0.01891653

give you F-tests (equivalent to 2-sided t-tests) which agree
with Bill's results above.

And, in this case, presenting the results as mean differences
shows that the effect of infection appears to differ between
vaccinated and unvaccinated groups; but a simple test shows
this not to be significant:

  1-pf( (sqrt(1/2)*((mYp-mYm)-(mNp-mNm))/sqrt(0.5502))^2, 1,4)
  ##[1] 0.4481097

As I said above, this would be my naive approach to this
particular case (and to any like it), and I would expect
to be able to explain all this in simple terms to a colleague
who was asking the sort of questions you have reported.
Or is it the case that offering an anova table is needed,
in order to evoke consent to the results by virtue of the
familiar format?

>* As expected, infection changes the mean for both vaccinated and*>* 
>unvaccinated, as we arranged when we generated the data.*
The challenge to generalise is interesting. However, as implied
above, I'm already impressed by Bill's footwork in R for this
simple case, and it might be some time before I'm fluent
enough in that sort of thing to deal with more complicated
cases, let alone the general one.

For users like myself, a syntax whose terms are closer to
ordinary language would be more approachable. Something
on the lines of

  summary(aov(y ~ vac*inf), by=inf, within=vac)

which would present a table similar to Bill's above (by inf
within the different levels of vac).

Intriguing. The challenge is tempting ... !

Best wishes,
Ted.
--------


This way for simple main effects testing looks reasonable to me. I
just wanted to know whether the user should have applied a multiple
comparison correction in this case (bonferroni, FDR, etc)?


cheers

Jake

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] multiple comparison correction

Reply via email to