On Wed, 14 Aug 2013, Cade, Brian wrote:

Z is correct, of course.  I was just being a little too simplistic in my explanation trying to emphasize the reversal of signs of the coefficients in the logistic regression part of the zero-inflated model.

When users ask me what the binary part of the two types of count models mean, I always say:

- In the zero-inflation model, the binary model predicts the probability of _zero inflation_ (= excess zeros).

- In the hurdle model, the binary model predicts the probability for _hurdle crossing_ (= non-zero response).

To me this always seemed natural, even if the sign reversal in the zero-inflation model may be surprising at first sight...

hth,
Z

Brian  

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov
tel:  970 226-9326



On Wed, Aug 14, 2013 at 4:07 AM, Achim Zeileis <achim.zeil...@uibk.ac.at>
wrote:
      On Tue, 13 Aug 2013, Cade, Brian wrote:

            Lauria:  For historical reasons the logistic
            regression (binomial with
            logit link) model portion of a zero-inflated count
            model is usually
            structured to predict the probability of the 0
            counts rather than the
            nonzero (>=1) counts so the coefficients will be the
            negative of what you
            expect based on the count model portion (as in your
            output).  It is simple
            to interpret the probability of the logistic
            regression portion as the
            probability of the nonzero counts by just taking the
            negative of the
            coefficient estimates provided for the probability
            of the zero counts.


      This is a common misinterpretation but not quite correct.

      The zero-inflation model is a mixture model of two components:
      (1) a count component (Poisson, NB, ...), and (2) a zero mass
      component (i.e., zero with probability 1). Hence, the observed
      zeros in the data can come from both sources: either they are
      "random" zeros from component (1) or "excess" zeros from
      component (2).

      The binomial zero-inflation part of the model predicts the
      probability that a given observation belongs to component (1).
      Thus, the probability of an "excess zero". But this is _not_ the
      probability of observing a zero in the data (which is larger
      than the excess zero probability).

      If you want a model that first models zero vs. non-zero and
      second the non-zero counts, use the hurdle model. This has
      exactly the interpretation you describe above.

      Best,
      Z

            Brian

            Brian S. Cade, PhD

            U. S. Geological Survey
            Fort Collins Science Center
            2150 Centre Ave., Bldg. C
            Fort Collins, CO  80526-8818

            email:  ca...@usgs.gov <brian_c...@usgs.gov>
            tel:  970 226-9326



            On Tue, Aug 13, 2013 at 9:06 AM, Lauria, Valentina <
            valentina.lau...@nuigalway.ie> wrote:

                  Dear All,

                  I am running a negative binomial model
                  in R using the package pscl in oder
                  to estimate bed sediment movements
                  versus river discharge. Currently we
                  have deployed 4 different plates to test
                  if a combination of more than one
                  plate would better describe the sediment
                  movements when the river discharge
                  changes over time.

                  My data are positively skewed and
                  zero-inflated. I did run both
                  zero-inflated Poisson and zero-inflated
                  negative binomial regression and
                  compared them using the VUONG test which
                  showed that the negative binomial
                  works better than a simple zero-inflated
                  Poisson.

                  My models look like:


                  1) plate1 ~ river discharge
                  2) (plate 1 + plate 2) ~ river discharge
                  3) (plate 1 + plate 2 +plate 3) ~ river
                  discharge
                  4) (plate 1 + plate 2 + plate 3 + plate
                  4) ~ river discharge


                  My main problem as I am new to these
                  type of models is that I get a
                  different sign for the coefficent of
                  discharge in the output of the
                  zero-inflated negative binomial model
                  (please see below). What does this
                  mean? Also how could I compare the
                  different models (1-4) i.e. what tells
                  me which is performing best? Thank you
                  very much in advance for any
                  comments and suggestions!!

                  Kind Regards,
                  Valentina


                  Call:
                  zeroinfl(formula = plate1 ~ discharge,
                  data = datafit_plates, dist =
                  "negbin", EM = TRUE)
                  Pearson residuals:
                      Min      1Q  Median      3Q     Max
                  -0.6770 -0.3564 -0.2101 -0.0814 12.3421

                  Count model coefficients (negbin with
                  log link):
                                           Estimate  
                   Std. Error z value Pr(>|z|)
                  (Intercept)  2.557066     0.036593  
                  69.88   <2e-16 ***
                  discharge    0.064698    0.001983  
                  32.63   <2e-16 ***
                  Log(theta)  -0.775736   0.012451  -62.30
                    <2e-16 ***

                  Zero-inflation model coefficients
                  (binomial with logit link):
                                        Estimate    Std.
                  Error     z value    Pr(>|z|)
                  (Intercept)   13.01011    0.22602    
                   57.56   <2e-16 ***
                  discharge    -1.64293    0.03092      
                  -53.14   <2e-16 ***
                  Theta = 0.4604
                  Number of iterations in BFGS
                  optimization: 1
                  Log-likelihood: -6.933e+04 on 5 Df






                          [[alternative HTML version
                  deleted]]

                  ______________________________________________
                  R-help@r-project.org mailing list
                  https://stat.ethz.ch/mailman/listinfo/r-help
                  PLEASE do read the posting guide
                  http://www.R-project.org/posting-guide.html
                  and provide commented, minimal,
                  self-contained, reproducible code.


                    [[alternative HTML version deleted]]

            ______________________________________________
            R-help@r-project.org mailing list
            https://stat.ethz.ch/mailman/listinfo/r-help
            PLEASE do read the posting guide
            http://www.R-project.org/posting-guide.html
            and provide commented, minimal, self-contained,
            reproducible code.



______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to