Hi Sir, Thanks for making package available to us. I am facing few problems if you can give some hints:
Problem-1: The model summary and residual deviance matched (in the mail below) but I didn't understand why AIC is still different. > AIC(m1) [1] 532965 > AIC(m1big_longer) [1] 101442.9 Problem-2: chunksize argument is there in bigglm but not in biglm, consequently, udate.biglm is there, but not update.bigglm Is my observation correct? If yes, why is this difference? Regards Utkarsh / / From: Thomas Lumley <tlumley_at_u.washington.edu <mailto:tlumley_at_u.washington.edu?Subject=Re:%20%5BR%5D%20bigglm%28%29%20results%20different%20from%20glm%28%29>> Date: Tue, 17 Mar 2009 00:50:20 -0700 (PDT) This is a surprisingly interesting problem that took a while to debug, because the computations all seemed correct. Your model hasn't converged yet. You can get the right answer either by running longer: /> summary(m1big_longer) / Large data regression model: bigglm(y ~ ttment, data = dat, family = poisson(link = "log"), chunksize = 100000, maxit = 20) Sample size = 100000 Coef (95% CI) SE p (Intercept) 2.304 2.301 2.307 0.001 0 ttment2 0.405 0.401 0.408 0.002 0 or supplying starting values: /> summary(m1big_started) / Large data regression model: bigglm(y ~ ttment, data = dat, family = poisson(link = "log"), chunksize = 100000, start = c(2, 0)) Sample size = 100000 Coef (95% CI) SE p (Intercept) 2.304 2.301 2.307 0.001 0 ttment2 0.405 0.401 0.408 0.002 0 The bug is that you weren't told about the lack of convergence. There is a flag in the object, but it is only set after the model is converged, so it is not there when convergence fails. /> m1big$converged / *NULL * /> m1big_longer$converged / *[1] TRUE * /> m1big_started$converged / *[1] TRUE *For the next version I will make sure there is a clear warning when the model hasn't converged. The default maximum number of iterations is fairly small, by design --- if it isn't working, you want to find out and specify starting values rather than wait for dozens of potentially slow iterations. This strategy obviously breaks down when you don't notice that failure. :( -thomas On Mon, 16 Mar 2009, Francisco J. Zagmutt wrote: > Dear all, <http://tolstoy.newcastle.edu.au/R/e6/help/09/03/8169.html#8182qlink1> /> / /> I am using the bigglm package to fit a few GLM's to a large dataset (3 million / /> rows, 6 columns). While trying to fit a Poisson GLM I noticed that the / /> coefficient estimates were very different from what I obtained when estimating / /> the model on a smaller dataset using glm(), I wrote a very basic toy example to / /> compare the results of bigglm() against a glm() call. Consider the following / /> code: / /> / /> / />> require(biglm) / />> options(digits=6, scipen=3, contrasts = c("contr.treatment", "contr.poly")) / />> dat=data.frame(y =c(rpois(50000, 10),rpois(50000, 15)), ttment=gl(2,50000)) / />> m1 <- glm(y~ttment, data=dat, family=poisson(link="log")) / />> m1big <- bigglm(y~ttment , data=dat, family=poisson(link="log")) / />> summary(m1) / /> / /> <snipped output for this email> / /> Coefficients: / /> Estimate Std. Error z value Pr(>|z|) / /> (Intercept) 2.30305 0.00141 1629 <2e-16 *** / /> ttment2 0.40429 0.00183 221 <2e-16 *** / /> / /> Null deviance: 151889 on 99999 degrees of freedom / /> Residual deviance: 101848 on 99998 degrees of freedom / */> AIC: 533152 / */> / />> summary(m1big) / /> Large data regression model: bigglm(y ~ ttment, data = dat, family = / /> poisson(link = "log")) / /> Sample size = 100000 / /> Coef (95% CI) SE p / /> (Intercept) 2.651 2.650 2.653 0.001 0 / /> ttment2 4.346 4.344 4.348 0.001 0 / /> / />> m1big$deviance / /> [1] 287158986 / /> / /> / /> Notice that the coefficients and deviance are quite different in the model / /> estimated using bigglm(). If I change the chunk to seq(1000,10000,1000) the / /> estimates remain the same. / /> / /> Can someone help me understand what is causing these differences? / /> / /> Here is my version info: / /> / />> version / /> _ / /> platform i386-pc-mingw32 / /> arch i386 / /> os mingw32 / /> system i386, mingw32 / /> status / /> major 2 / /> minor 8.1 / /> year 2008 / /> month 12 / /> day 22 / /> svn rev 47281 / /> language R / /> version.string R version 2.8.1 (2008-12-22) / /> / /> / /> Many thanks in advance for your help, / /> / /> Francisco / /> / /> -- / /> Francisco J. Zagmutt / /> Vose Consulting / /> 2891 20th Street / /> Boulder, CO, 80304 / */> USA / */> www.voseconsulting.com / /> / /> ______________________________________________ / /> R-help_at_r-project.org mailing list / /> https://stat.ethz.ch/mailman/listinfo/r-help / /> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html / /> and provide commented, minimal, self-contained, reproducible code. / /> / Thomas Lumley Assoc. Professor, Biostatistics tlumley_at_u.washington.edu University of Washington, Seattle [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.