Dear all,

I am using the bigglm package to fit a few GLM's to a large dataset (3 million rows, 6 columns). While trying to fit a Poisson GLM I noticed that the coefficient estimates were very different from what I obtained when estimating the model on a smaller dataset using glm(), I wrote a very basic toy example to compare the results of bigglm() against a glm() call. Consider the following code:


> require(biglm)
> options(digits=6, scipen=3, contrasts = c("contr.treatment", "contr.poly")) > dat=data.frame(y =c(rpois(50000, 10),rpois(50000, 15)), ttment=gl(2,50000))
> m1 <- glm(y~ttment, data=dat, family=poisson(link="log"))
> m1big <- bigglm(y~ttment , data=dat, family=poisson(link="log"))
> summary(m1)

<snipped output for this email>
Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  2.30305    0.00141    1629   <2e-16 ***
ttment2      0.40429    0.00183     221   <2e-16 ***

    Null deviance: 151889  on 99999  degrees of freedom
Residual deviance: 101848  on 99998  degrees of freedom
AIC: 533152

> summary(m1big)
Large data regression model: bigglm(y ~ ttment, data = dat, family = poisson(link = "log"))
Sample size =  100000
             Coef  (95%   CI)    SE p
(Intercept) 2.651 2.650 2.653 0.001 0
ttment2     4.346 4.344 4.348 0.001 0

> m1big$deviance
[1] 287158986


Notice that the coefficients and deviance are quite different in the model estimated using bigglm(). If I change the chunk to seq(1000,10000,1000) the estimates remain the same.

Can someone help me understand what is causing these differences?

Here is my version info:

> version
               _
platform       i386-pc-mingw32
arch           i386
os             mingw32
system         i386, mingw32
status
major          2
minor          8.1
year           2008
month          12
day            22
svn rev        47281
language       R
version.string R version 2.8.1 (2008-12-22)


Many thanks in advance for your help,

Francisco

--
Francisco J. Zagmutt
Vose Consulting
2891 20th Street
Boulder, CO, 80304
USA
www.voseconsulting.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to