Re: [R] Zero inflated: is there a limit to the level of inflation

Marc Schwartz Tue, 26 Jun 2012 14:33:30 -0700

On Jun 26, 2012, at 2:10 PM, SSimek wrote:

> Hello, 
> 
> I have count data that illustrate the presence or absence of individuals in
> my study population. I created a grid cell across the study area and
> calcuated a count value for each individual per season per year for each
> grid cell. The count value is the number of time an individual was present
> in each grid cell.  For illustration my data columns look something like
> this and are repeated for each individual:
> 
> Cell_ID       Param1  Param2  Param3  Param4  COUNT   Name    Year    Season  
> Cov
> 1     160.565994      729.08  1503    7930.3  0       AA      2010    AUT     
> Open
> 1     160.565994      729.08  1503    7930.3  22      AA      2011    SPR     
> Open
> 1     160.565994      729.08  1503    7930.3  12      AA      2009    SUM     
> Open
> 1     160.565994      729.08  1503    7930.3  0       AA      2010    SUM     
> Open
> 2     169.427001      491.87  1503.31 5101.09 0       AA      2010    AUT     
> oldHard
> 2     169.427001      491.87  1503.31 5101.09 16      AA      2011    SPR     
> oldHard
> 2     169.427001      491.87  1503.31 5101.09 0       AA      2009    SUM     
> oldHard
> 2     169.427001      491.87  1503.31 5101.09 0       AA      2010    SUM     
> oldHard
> …                                                                     
> 563   86.777099       612.69  977     4474.6  62      AA      2010    AUT     
> Water
> 563   86.777099       612.69  977     4474.6  12      AA      2011    SPR     
> Water
> 563   86.777099       612.69  977     4474.6  55      AA      2009    SUM     
> Water
>                                                                       
>                                                                       
> 1     160.565994      729.08  1503    7930.3  0       BB      2010    SUM     
> Open
> 2     169.427001      491.87  1503.31 5101.09 72      BB      2010    SUM     
> oldHard
> 5     160.75  614.95  1503.31 2878.98 16      BB      2010    SUM     medHard
> 6     170.404998      510.58  1489.44 743.14  0       BB      2010    SUM     
> Water
> …                                                                     
> 563   86.777099       612.69  977     4474.6  0       BB      2010    SUM     
> Water
>                                                                       
>                                                                       
> 1     160.565994      729.08  1503    7930.3  14      C       2005    AUT     
> Open
> 1     160.565994      729.08  1503    7930.3  0       C       2006    AUT     
> Open
> 1     160.565994      729.08  1503    7930.3  0       C       2006    SPR     
> Open
> 1     160.565994      729.08  1503    7930.3  56      C       2007    SPR     
> Open
> 1     160.565994      729.08  1503    7930.3  0       C       2006    SUM     
> Open
> 2     169.427001      491.87  1503.31 5101.09 124     C       2005    AUT     
> oldHard
> 2     169.427001      491.87  1503.31 5101.09 231     C       2006    AUT     
> oldHard
> 2     169.427001      491.87  1503.31 5101.09 889     C       2006    SPR     
> oldHard
> 2     169.427001      491.87  1503.31 5101.09 0       C       2007    SPR     
> oldHard
> …                                                                     
> 563   86.777099               612.69  977     4474.6  0       C       2005    
> AUT     Water
> 563   86.777099               612.69  977     4474.6  231     C       2006    
> AUT     Water
> 563   86.777099               612.69  977     4474.6  185     C       2006    
> SPR     Water
> 563   86.777099               612.69  977     4474.6  123     C       2007    
> SPR     Water
> 563   86.777099               612.69  977     4474.6  52      C       2006    
> SUM     Water
> 
> 
> 
> I have 563 grid cells across my study area and each individual has 1-563
> cells associated for each year and each season the individual was monitored.
> Therefore my grid cells are repeated. I end up with 71,000 records and 925
> records have a Count value >0; which means 70,075 records have a Count value
> = 0. 
> 
> I wanted to run a zero inflated poisson model to determine mixed effects (of
> parameters) with individual as the random effect. But I have been advised
> two things:
> 
> 1. I cannot run a zero inflated poisson model because my data are too
> "extremely" inflated (i.e. 70,075 vs 925) and 
> 
> 2. I cannot run the model with each cell repeated for each individual. I am
> told the model doesn't recognize that Cell_ID #1 for individual "A" is the
> same Cell_ID #1 for individual "B".
> 
> Does anyone know if either or both of these points are true? I would
> appreciate any thoughts, advice, or suggestions. 
> 
> Thanks!
> 
> -Stephanie



Hi Stephanie,

Some comments:

1. You should think about or at least be open to a zero inflated negative 
binomial distribution rather than zero inflated poisson. 

2. You should at least review the vignette for the pscl CRAN package, which 
provides standard fixed effects models and related functions for count based 
data and importantly, some good conceptual content:

  http://cran.r-project.org/web/packages/pscl/vignettes/countreg.pdf

3. Given the repeated measures framework and correlation issues you likely 
have, you should subscribe to and re-post your query to the R-sig-mixed-models 
list:

  https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

which will avail you of experts in the field. 

4. There is also a draft FAQ for mixed models here:

  http://glmm.wikidot.com/faq

which I believe is maintained by Ben Bolker, who actively participates in the 
above list. Based upon the content there, I suspect that you will be pointed to 
the glmmADMB package which is on R-Forge 
(http://glmmadmb.r-forge.r-project.org/) and can handle zero inflated mixed 
effects models of at least some types.

5. If all else fails, just to plant a seed, you might want to consider a mixed 
effects logistic regression model with a binary response, since you appear to 
have a relatively small "event" incidence in your data. The above list will 
also be helpful in that setting and you would likely be pointed to the glmer() 
function in the lme4 package for that application, which provides for GLMs in a 
mixed effects framework.

Regards,

Marc Schwartz

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Zero inflated: is there a limit to the level of inflation

Reply via email to