Hi, I fit both Poisson and NB (negative binomial) models to some empirical data.
Although models provide me with sensible parameters, in the case of the NB models i get three inconsistencites: - First, the total number of occurrences predicted by the model (i.e. fitted(fit)) is much greater than those of the data. I realise that poisson and NB models are different in the sense that expectations for the NB model do not need to be equal (however the over-estimation of the NB model is too much i believe) - Sometimes there exist a datapoint that predicts 1000 times more occurences than what would be expected. - Sometimes the model with offset predicts sensible results but if I take the offset and use log(variable) I obtain some datapoints that predict many more occurenes than what would be expected. I have tried to create an example of the aforementioned problems. However, i only achieved to recreate my first problem (normally 20% of increase is shown). And as it happens, no problem is shown in realtion to my third problem as the predicted and observed values are equal for this example. #----------------------------------------------------------------------------------- # Response variable with "lots" zeros (I dont want to use hurdle or ZIP models...) response <- rpois(1000, 1) * sample(rep(0:1,1000), size=1000, replace=FALSE) # Offset, numerical and categorical variables offset.var <- sample(rep(1:10,1000), size=1000, replace=FALSE) numerical <- sample(rep(1:1000,1000), size=1000, replace=FALSE) categorical <- sample(rep(c("A","B","C"),1000), size=1000, replace=FALSE) # Dataframe example.data <-data.frame(offset.var,numerical,categorical,response) # Fit fit.po <- glm(response ~ numerical + categorical + offset(log(offset.var)), family="poisson",data = example.data) fit.nb <- glm.nb(response ~ numerical + categorical + offset(log(offset.var)), data = example.data) fit.po.non.off <- glm(response ~ numerical + categorical + log(offset.var), family="poisson",data = example.data) fit.nb.non.off <- glm.nb(response ~ numerical + categorical + log(offset.var), data = example.data) # Comparison sum(response) sum(fitted(fit.po)) sum(fitted(fit.nb )) sum(fitted(fit.po.non.off)) sum(fitted(fit.nb.non.off )) #----------------------------------------------------------------------------------- Any thoughts?? Many thanks -- View this message in context: http://r.789695.n4.nabble.com/NB-and-poisson-glm-models-three-issues-tp4083890p4083890.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.