Re: [R] Tukey Kramer with ANOVA (glm)

David Winsemius Thu, 14 Jun 2012 23:45:28 -0700


On Jun 14, 2012, at 2:17 PM, Alaska_Man wrote:

Dr. Winsemius,
Really quick, a BACI is a Before-After-Control-Impact approach. Ihave a long time series of sea cucumber density estimates, which aretaken at the same location(s) through time. Some are in areasImpacted by sea otters and some are in areas Not Impacted by seaotters (two levels). Each estimate is also coded with a "B" if itis Before the time sea otters showed up at the impacted sites or "A"if after (Impact and Control sites are both coded with BA).

So you probably need a mixed-effects analysis because you haverepeated measures (how many?) in the same location. (How manylocations?)

BACI analyses suggest an impact if the ANOVA interaction term(BA*Otter) is significant; i.e., changes in sea cucumber densityfrom before to after depend on whether sea otters are present. Ilog transformed the response to help normalize the data, as it hasmany zeros.

I'm not sure that makes good sense. I don't think those are structuralzeros. Knowing how difficult it is to find sea critters, I think youstill have a significant possibility that one or more sea cumcumberswas missed in those sites with measured zero values. For one thing thelog of 0 is not a well defined value. For another thing I think itinflates the impact of small numbers on the inferential statistic, Andfor a third thing, the interpretation of effects gets all confused.You are not really interested in a ratio effect measure, at least Iwouldn't. Far preferable would be to use some type of robuststatistic to handle the inference issues and keep the estimates on alinear scale, perhaps using bootstrap methods. Davison and Hinkleyhave pre-post designs in their "practicals".

While shapiro-wilks does not suggest normality, it is a very largedata set and it is "approximately" normal based on graphicalexamination. Again, the data is unbalanced, as there are many moreestimates at the control sites and before period.

So you should not be using aov, since that method assumes balance.Regression methods should be appropriate.

With that said... I would like to perform the following pairwisecomparisons; B:With Otters v. A:With Otters, B:Without Otters v.A:Without OttersI am performing other ANOVAs with different data and no interaction,where I would like to perform multiple pairwise comparisons betweenthe fivelevels of a single factor. I used the code I providedpreviously and still received error messages. If I can get thisBACI interaction problem figured out, I should manage to adjust itto other models. I recently came accross Dunnett's Modified TukeyKramer (DTK.test) and it appears to address the same issue ofunbalanced data and has a very straightforward script (although I amnot sure it lends itself to interactions?). Is this test anappropriate substitute for the glht method?You wrote, "When your factors are both binary, the effect estimatesfit nicely into a 2 x 2 table and the consideration of the singlecontrast added by the interaction is fairlysimple" wool=='A'wool=='B' tension=='L' 3.7179-0.4356+3.7179 tension=='M' -0.6012+3.71790.6281+3.7179This output seems to be what I am looking for; assuming that if thevalue range for a comparison includes zero, then there is nosignificant difference?

Those are the predicted levels of log(breaks) at various combinationsof wool and tension. You can pretty much always create such a tablefrom the coefficients in a linear model. (Since you used glm() withouta family argument you got a linear link.)

 Where did those values come from?

I just read them off the output of print(model) and added theappropriate contrasts to the baseline "Intercept" which applies to thewool=="A" and tension=="L" category . With R's default treatmentcontrasts, all coefficients are referenced to the Intercept, and soyou need to add back each of the coefficients to get estimates for theseparate groups.

I hope this helps clear up my problem. If you have concerns aboutpitfalls with this approach, then I would love to hear them and Ican research them outside of this thread.

I would think this should be discussed with your advisor. If s/hethinks its appropriate to get further Internet-mediated advice, thenyou should go either to stats.stackexchange or the R-SIG-ME mailinglist where they have better minds than mine to bring to bear ondesigns that are hierarchal.

 This is part of a masters thesis and needs to be sound.

Thank you very much for your time.
Sean

Date: Thu, 14 Jun 2012 10:26:58 -0700
From: ml-node+s789695n4633417...@n4.nabble.com
To: seanlars...@hotmail.com
Subject: Re: Tukey Kramer with ANOVA (glm)



        
On Jun 13, 2012, at 7:36 PM, Alaska_Man wrote:

Hello,

I am performing a BACI analysis with ANOVA using the following glm:



I admit I had no idea what a "BACI analysis" might be. Looking it up

it appears to be a cross-over design and my statistical betters have

sternly warned me about this regression briar patch in the past. I'm

especially suspicious of the lack of any statements about the balance

in the sampling in your presentation. (And for that matter the

extremely sketchy statement of design.)

fit1<-glm(log(Cucs_m+1)~(BA*Otter)+BA+Otter+ID+Primary, data=b1)



I'm guessing you do not understand that BA*Otter in an R formula

expands to BA + Otter + BA:Otter

The summary(aov(fit1)) shows significance in the interaction;

however, now I

would like to determine what combinations of BA and Otter are

significantly

different (each factor has two levels). ID and PRIMARY substratesare

categorical and included in the model to help explain some of the

variation

in the data.  The data is unbalanced so I plan on using Tukey Kramer

post

hoc analysis.  Here is how my data is laid out, it is a fairly

substantial

data set:


Editing done on original (although it proved unrevealing.)

Subdistrict  T   Year  Cucs_m  Primary Persistence   Otter

Fishing    BA         ID

109-41,42    9  2010   0.00     sil           3      1

1        A   109-41,42

109-41,42   13  2010   2.75     rck           3      1

1        A   109-41,42

109-41,42   16  2010   2.00     rck           3      0

1        A   109-41,42

109-41,42   18  2010   8.25     rck           3      0

0        B   109-41,42

I am assuming this is an appropriate pairwise comparison analysis

and I

cannot get the code to work with my data.


What does it mean to be doing "pairwise comparisons" on two-level

factor variables?)

I am *unclear how to code it to

work with the interaction*; however, even when I attempt to use it

only for

a single factor, it does not work (see below).

x<-aov(glm(Cucs_m~as.factor(BA),data=cuc))

glht(x, linfct=mcp(BA="Tukey"))

....................................

Error in mcp2matrix(model, linfct = linfct) :

Variable(s) ‘BA’ have been specified in ‘linfct’ but cannot be

found in

‘model’!


I suspect the glht() function is looking for 'as.factor(BA)` in the

model matrix and not finding it. If BA is not already a factor, then

it would make sense to do:


cuc$BA <- factor(cuc$BA)


.... before any analysis. Notice that you get a warning that

performing contrasts in the presence of interactions is something to

be warned about. If you do not know what you are doing here (and your

proposed analysis hints at that possibility), I may have set a trap

for you by solving a syntactic problem but not solving a conceptual

problem.

mod <- glm(log(breaks) ~ wool*tension, data=subset(warpbreaks,


tension %in% c("L","M")))

glht(mod, linfct=mcp(tension="Tukey"))



        General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Linear Hypotheses:

           Estimate

M - L == 0  -0.6012


Warning message:

In mcp2matrix(model, linfct = linfct) :

  covariate interactions found -- default contrast might be

inappropriate

---------------

Looking at the mod-object you see that the "Estimate" above is

actually NOT what you had interest in. You were presumably more

interested in the contrast woolB:tensionM whose coefficient was0.6281.


----


Coefficients:

   (Intercept)           woolB        tensionM  woolB:tensionM

        3.7179         -0.4356         -0.6012          0.6281

----------


I would have instead done something like this:

mod <- glm(log(breaks) ~ wool*tension, data=subset(warpbreaks,


tension %in% c("L","M")))

mod2 <- glm(log(breaks) ~ wool+tension, data=subset(warpbreaks,


tension %in% c("L","M")))

anova(mod,mod2)


Analysis of Deviance Table


Model 1: log(breaks) ~ wool * tension

Model 2: log(breaks) ~ wool + tension

  Resid. Df Resid. Dev Df Deviance

1        32     4.6235

2        33     5.5113 -1 -0.88777


Now I can say that the addition of an interaction term resulted in a

non-significant improvement in model fit at least when measured on the

log(breaks) scale. (Note: This is quite a different result than one

sees on the untransformed scale where the interaction is highly

significant.) When your factors are both binary, the effect estimates

fit nicely into a 2 x 2 table and the consideration of the single

contrast added by the interaction is fairly simple.


                         wool=='A'                  wool=='B'

 tension=='L'         3.7179                      -0.4356+3.7179

 tension=='M'     -0.6012+3.7179          0.6281+3.7179

Can anyone off suggestions on potential problems with my approach

and/or

script issues?



Why was the log transformation being done? Is the desired outcome a

statement about ratios?


--


David Winsemius, MD

West Hartford, CT


______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

If you reply to this email, your message will be added to thediscussion below:

                
http://r.789695.n4.nabble.com/Tukey-Kramer-with-ANOVA-glm-tp4633314p4633417.html
        
        
                
                To unsubscribe from Tukey Kramer with ANOVA (glm), click here.

                NAML
                                                

--
View this message in context: 
http://r.789695.n4.nabble.com/Tukey-Kramer-with-ANOVA-glm-tp4633314p4633435.html
Sent from the R help mailing list archive at Nabble.com.
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Tukey Kramer with ANOVA (glm)

Reply via email to