R gurus,
I'm working on data analysis for a small project. My response
variable is total vines per tree (median = 0, mean = 1.65, min = 0,
max = 24). My predictors are two categorical variables (four sites
and four species) and one continuous (tree diameter at breast height
(DBH)). The main question I'm attempting to answer is whether or not
the species identity of a tree has any effects on the number of vines
clinging to the trunk. Given that the response variable is count
data, I decided to use Poisson regression, even though I'm not as
familiar with it as linear or logit regression.
My problem is deciding which model to use. I have created several,
one without interaction terms (Total.vines~Site+Species+DBH), one
with an interaction term between Site and Species
(Total.vines~Site*Species+DBH), and one with interactions between all
variables (Total.vines~Site*Species*DBH). Here is my output from R
for the first two models (the last model has the same number (and
identity) of significant variables as the second model, even though
the last model had more interaction terms overall):
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Call:
glm(formula = Total.vines ~ Site + Species + DBH, family = poisson)
Deviance Residuals:
Min 1Q Median 3Q Max
-5.2067 -1.2915 -0.7095 -0.3525 6.3756
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.987695 0.231428 -12.910 < 2e-16 ***
SiteHuffman Dam 2.725193 0.249423 10.926 < 2e-16 ***
SiteNarrows 1.902987 0.227599 8.361 < 2e-16 ***
SiteSugar Creek 1.752754 0.242186 7.237 4.58e-13 ***
SpeciesFRAM 0.955468 0.157423 6.069 1.28e-09 ***
SpeciesPLOC 1.187903 0.141707 8.383 < 2e-16 ***
SpeciesULAM 0.340792 0.184615 1.846 0.0649 .
DBH 0.020708 0.001292 16.026 < 2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 1972.3 on 544 degrees of freedom
Residual deviance: 1290.0 on 537 degrees of freedom
AIC: 1796.0
Number of Fisher Scoring iterations: 6
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Call:
glm(formula = Total.vines ~ Site * Species + DBH, family = poisson,
data = sycamores.1)
Deviance Residuals:
Min 1Q Median 3Q Max
-4.9815 -1.2370 -0.6339 -0.3403 6.5664
Coefficients: (3 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.788243 0.303064 -9.200 < 2e-16 ***
SiteHuffman Dam 1.838952 0.354127 5.193 2.07e-07 ***
SiteNarrows 2.252716 0.323184 6.970 3.16e-12 ***
SiteSugar Creek -12.961519 519.152077 -0.025 0.980082
SpeciesFRAM 13.938716 519.152230 0.027 0.978580
SpeciesPLOC 0.240223 0.540676 0.444 0.656824
SpeciesULAM 1.919586 0.540246 3.553 0.000381 ***
DBH 0.019984 0.001337 14.946 < 2e-16 ***
SiteHuffman Dam:SpeciesFRAM -11.513823 519.152294 -0.022 0.982306
SiteNarrows:SpeciesFRAM -13.593127 519.152268 -0.026 0.979111
SiteSugar Creek:SpeciesFRAM NA NA NA NA
SiteHuffman Dam:SpeciesPLOC NA NA NA NA
SiteNarrows:SpeciesPLOC 0.397503 0.555218 0.716 0.474028
SiteSugar Creek:SpeciesPLOC 15.640450 519.152277 0.030 0.975966
SiteHuffman Dam:SpeciesULAM -0.102841 0.610027 -0.169 0.866124
SiteNarrows:SpeciesULAM -2.809092 0.606804 -4.629 3.67e-06 ***
SiteSugar Creek:SpeciesULAM NA NA NA NA
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 1972.3 on 544 degrees of freedom
Residual deviance: 1178.7 on 531 degrees of freedom
AIC: 1696.6
Number of Fisher Scoring iterations: 13
%%%%%%%%%%%%%%%%%%%%
As you can see, the two models give very different output, especially
in regards to whether or not the individual species are significant.
In the no-interaction model, the only species that was not
significant was ULAM. In the one-way interaction model, ULAM was the
only significant species. My question is this: which model should I
use when I present this analysis? I know that the one-way
interaction model has the lower AIC. Should I base my choice solely
on AIC? The reasons I'm asking is that the second model has only one
significant interaction term, fewer significant terms overall, and
three undefined terms.
Thanks for any guidance you can give to someone running his first
Poisson regression.
Jim Milks
Graduate Student
Environmental Sciences Ph.D. Program
136 Biological Sciences
Wright State University
3640 Colonel Glenn Hwy
Dayton, OH 45435
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.