Re: [R] Problem with zero-inflated negative binomial model in sediment river dynamics

Lauria, Valentina Wed, 14 Aug 2013 12:22:31 -0700

Dear Brian and Achim,

Many thanks for your reply and help it is very much appreciated!


All the best,
Valentina

Dr. Valentina Lauria
Postdoctoral researcher
Room 118, Martin Ryan Institute
Department of Earth and Ocean Sciences
National University of Ireland, Galway
Ireland

www.nephrops.eu<http://www.nephrops.eu>
________________________________
From: Cade, Brian [ca...@usgs.gov]
Sent: 14 August 2013 16:15
To: Achim Zeileis
Cc: Lauria, Valentina; r-help@R-project.org
Subject: Re: [R] Problem with zero-inflated negative binomial model in sediment 
river dynamics

Z is correct, of course.  I was just being a little too simplistic in my 
explanation trying to emphasize the reversal of signs of the coefficients in 
the logistic regression part of the zero-inflated model.

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov<mailto:brian_c...@usgs.gov>
tel:  970 226-9326



On Wed, Aug 14, 2013 at 4:07 AM, Achim Zeileis 
<achim.zeil...@uibk.ac.at<mailto:achim.zeil...@uibk.ac.at>> wrote:
On Tue, 13 Aug 2013, Cade, Brian wrote:

Lauria:  For historical reasons the logistic regression (binomial with
logit link) model portion of a zero-inflated count model is usually
structured to predict the probability of the 0 counts rather than the
nonzero (>=1) counts so the coefficients will be the negative of what you
expect based on the count model portion (as in your output).  It is simple
to interpret the probability of the logistic regression portion as the
probability of the nonzero counts by just taking the negative of the
coefficient estimates provided for the probability of the zero counts.

This is a common misinterpretation but not quite correct.

The zero-inflation model is a mixture model of two components: (1) a count 
component (Poisson, NB, ...), and (2) a zero mass component (i.e., zero with 
probability 1). Hence, the observed zeros in the data can come from both 
sources: either they are "random" zeros from component (1) or "excess" zeros 
from component (2).

The binomial zero-inflation part of the model predicts the probability that a 
given observation belongs to component (1). Thus, the probability of an "excess 
zero". But this is _not_ the probability of observing a zero in the data (which 
is larger than the excess zero probability).

If you want a model that first models zero vs. non-zero and second the non-zero 
counts, use the hurdle model. This has exactly the interpretation you describe 
above.

Best,
Z

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov<mailto:ca...@usgs.gov> 
<brian_c...@usgs.gov<mailto:brian_c...@usgs.gov>>
tel:  970 226-9326



On Tue, Aug 13, 2013 at 9:06 AM, Lauria, Valentina <
valentina.lau...@nuigalway.ie<mailto:valentina.lau...@nuigalway.ie>> wrote:

Dear All,

I am running a negative binomial model in R using the package pscl in oder
to estimate bed sediment movements versus river discharge. Currently we
have deployed 4 different plates to test if a combination of more than one
plate would better describe the sediment movements when the river discharge
changes over time.

My data are positively skewed and zero-inflated. I did run both
zero-inflated Poisson and zero-inflated negative binomial regression and
compared them using the VUONG test which showed that the negative binomial
works better than a simple zero-inflated Poisson.

My models look like:


1) plate1 ~ river discharge
2) (plate 1 + plate 2) ~ river discharge
3) (plate 1 + plate 2 +plate 3) ~ river discharge
4) (plate 1 + plate 2 + plate 3 + plate 4) ~ river discharge


My main problem as I am new to these type of models is that I get a
different sign for the coefficent of discharge in the output of the
zero-inflated negative binomial model (please see below). What does this
mean? Also how could I compare the different models (1-4) i.e. what tells
me which is performing best? Thank you very much in advance for any
comments and suggestions!!

Kind Regards,
Valentina


Call:
zeroinfl(formula = plate1 ~ discharge, data = datafit_plates, dist =
"negbin", EM = TRUE)
Pearson residuals:
    Min      1Q  Median      3Q     Max
-0.6770 -0.3564 -0.2101 -0.0814 12.3421

Count model coefficients (negbin with log link):
                         Estimate    Std. Error z value Pr(>|z|)
(Intercept)  2.557066     0.036593   69.88   <2e-16 ***
discharge    0.064698    0.001983   32.63   <2e-16 ***
Log(theta)  -0.775736   0.012451  -62.30   <2e-16 ***

Zero-inflation model coefficients (binomial with logit link):
                      Estimate    Std. Error     z value    Pr(>|z|)
(Intercept)   13.01011    0.22602      57.56   <2e-16 ***
discharge    -1.64293    0.03092       -53.14   <2e-16 ***
Theta = 0.4604
Number of iterations in BFGS optimization: 1
Log-likelihood: -6.933e+04 on 5 Df






        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org<mailto:R-help@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org<mailto:R-help@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with zero-inflated negative binomial model in sediment river dynamics

Reply via email to