Re: [R] glm.nb versus glm estimation of theta.

Achim Zeileis Thu, 13 Aug 2009 16:37:37 -0700

On Thu, 13 Aug 2009, hesicaia wrote:


Hello,

I have a question regarding estimation of the dispersion parameter (theta)
for generalized linear models with the negative binomial error structure. As


The theta is different from the dispersion. In the usual GLM notation:
    E[y] = mu
  VAR[y] = dispersion * V(mu)

The function V() depends on the family and is
  Poisson: V(mu) = mu
  NB:      V(mu) = mu + 1/theta * mu^2

For both models, dispersion is known to be 1 (from the likelihood).However, a quasi-Poisson approach can be adopted where dispersion isestimated (but does not correspond to a specific likelihood).

Thus, dispersion and theta are really different things although both ofthem can be used to capture overdispersion.

I understand, there are two main methods to fit glm's using the nb error
structure in R: glm.nb() or glm() with the negative.binomial(theta) family.
Both functions are implemented through the MASS library. Fitting the model
using these two functions to the same data produces much different results
for me in terms of estimated theta and the coefficients, and I am not sure
why.


...because you tell them to do different things.

the following model:
mod<-glm.nb(count ~ year + season + time + depth,
link="log",data=dat,control=glm.control(maxit=100,trace=T))
estimates theta as 0.0109

This approach estimates theta (= 0.0109) and assumes that dispersion isknown to be 1. The underlying estimated variance function is:

  VAR[y] = mu + 91.74312 * mu^2

while the following model:
mod2<-glm(count ~ year + season + time + depth,
family=negative.binomial(theta=100),link="log",data=dat,control=glm.control(maxit=100,trace=T))
will not accept 0.0109 as theta and instead estimates it as 1271 (these are
fisheries catch data and so are very overdispersed).

This does not estimate theta at all but keeps it fixed (= 100). Bydefault, however, dispersion will be estimated by the summary() method,presumably leading to the value of 1271 you report. The underlyingvariance function would then be

  VAR[y] = 1271 * mu + 12.71 * mu^2

Fitting a quasipoisson model also yields a large dispersion parameter
(1300). The models also produce different coefficients and P-values, which
is disconcerting.


This implies yet another variance function, namely
  VAR[y] = 1300 * mu

If you want to get essentially the same result as
  summary(mod)
from using glm+negative.binomial you can do
  mod0 <- glm(count ~ year + season + time + depth, data = dat,
    family = negative.binomial(theta = mod$theta),
    control = glm.control(maxit = 100))
  summary(mod0, dispersion = 1)
(Note that link = "log" is not needed.)

What am I doing wrong here? I've read through the help sections
(?negative.binomial,?glm.nb, and ?glm) but did not find any answers.

I guess the authors of the "MASS" package would say that the softwareaccompanies a book which should be consulted...and they would be right.Reading the corresponding sections in MASS (the book) will surely behelpful. Consulting McCullagh & Nelder about some general GLM theory can'thurt either. Finally, some extended modeling (including excess zeros) isavailable in

  http://www.jstatsoft.org/v27/i08/
(Apologies for advertising this twice on the same day.)

hth,
Z

Any help and/or input is greatly appreciated!
Daniel.
--
View this message in context: 
http://www.nabble.com/glm.nb-versus-glm-estimation-of-theta.-tp24956438p24956438.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glm.nb versus glm estimation of theta.

Reply via email to