Dear Brian and Achim, Many thanks for your reply and help it is very much appreciated!
All the best, Valentina Dr. Valentina Lauria Postdoctoral researcher Room 118, Martin Ryan Institute Department of Earth and Ocean Sciences National University of Ireland, Galway Ireland www.nephrops.eu<http://www.nephrops.eu> ________________________________ From: Cade, Brian [ca...@usgs.gov] Sent: 14 August 2013 16:15 To: Achim Zeileis Cc: Lauria, Valentina; r-help@R-project.org Subject: Re: [R] Problem with zero-inflated negative binomial model in sediment river dynamics Z is correct, of course. I was just being a little too simplistic in my explanation trying to emphasize the reversal of signs of the coefficients in the logistic regression part of the zero-inflated model. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: ca...@usgs.gov<mailto:brian_c...@usgs.gov> tel: 970 226-9326 On Wed, Aug 14, 2013 at 4:07 AM, Achim Zeileis <achim.zeil...@uibk.ac.at<mailto:achim.zeil...@uibk.ac.at>> wrote: On Tue, 13 Aug 2013, Cade, Brian wrote: Lauria: For historical reasons the logistic regression (binomial with logit link) model portion of a zero-inflated count model is usually structured to predict the probability of the 0 counts rather than the nonzero (>=1) counts so the coefficients will be the negative of what you expect based on the count model portion (as in your output). It is simple to interpret the probability of the logistic regression portion as the probability of the nonzero counts by just taking the negative of the coefficient estimates provided for the probability of the zero counts. This is a common misinterpretation but not quite correct. The zero-inflation model is a mixture model of two components: (1) a count component (Poisson, NB, ...), and (2) a zero mass component (i.e., zero with probability 1). Hence, the observed zeros in the data can come from both sources: either they are "random" zeros from component (1) or "excess" zeros from component (2). The binomial zero-inflation part of the model predicts the probability that a given observation belongs to component (1). Thus, the probability of an "excess zero". But this is _not_ the probability of observing a zero in the data (which is larger than the excess zero probability). If you want a model that first models zero vs. non-zero and second the non-zero counts, use the hurdle model. This has exactly the interpretation you describe above. Best, Z Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: ca...@usgs.gov<mailto:ca...@usgs.gov> <brian_c...@usgs.gov<mailto:brian_c...@usgs.gov>> tel: 970 226-9326 On Tue, Aug 13, 2013 at 9:06 AM, Lauria, Valentina < valentina.lau...@nuigalway.ie<mailto:valentina.lau...@nuigalway.ie>> wrote: Dear All, I am running a negative binomial model in R using the package pscl in oder to estimate bed sediment movements versus river discharge. Currently we have deployed 4 different plates to test if a combination of more than one plate would better describe the sediment movements when the river discharge changes over time. My data are positively skewed and zero-inflated. I did run both zero-inflated Poisson and zero-inflated negative binomial regression and compared them using the VUONG test which showed that the negative binomial works better than a simple zero-inflated Poisson. My models look like: 1) plate1 ~ river discharge 2) (plate 1 + plate 2) ~ river discharge 3) (plate 1 + plate 2 +plate 3) ~ river discharge 4) (plate 1 + plate 2 + plate 3 + plate 4) ~ river discharge My main problem as I am new to these type of models is that I get a different sign for the coefficent of discharge in the output of the zero-inflated negative binomial model (please see below). What does this mean? Also how could I compare the different models (1-4) i.e. what tells me which is performing best? Thank you very much in advance for any comments and suggestions!! Kind Regards, Valentina Call: zeroinfl(formula = plate1 ~ discharge, data = datafit_plates, dist = "negbin", EM = TRUE) Pearson residuals: Min 1Q Median 3Q Max -0.6770 -0.3564 -0.2101 -0.0814 12.3421 Count model coefficients (negbin with log link): Estimate Std. Error z value Pr(>|z|) (Intercept) 2.557066 0.036593 69.88 <2e-16 *** discharge 0.064698 0.001983 32.63 <2e-16 *** Log(theta) -0.775736 0.012451 -62.30 <2e-16 *** Zero-inflation model coefficients (binomial with logit link): Estimate Std. Error z value Pr(>|z|) (Intercept) 13.01011 0.22602 57.56 <2e-16 *** discharge -1.64293 0.03092 -53.14 <2e-16 *** Theta = 0.4604 Number of iterations in BFGS optimization: 1 Log-likelihood: -6.933e+04 on 5 Df [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org<mailto:R-help@r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org<mailto:R-help@r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.