Thank you Mr. Lumley and Mr. Greg. That was helpful.
Regards
Utkarsh
Thomas Lumley wrote:
On Fri, 3 Jul 2009, utkarshsinghal wrote:
Hi Sir,
Thanks for making package available to us. I am facing few problems
if you can give some hints:
Problem-1:
The model summary and residual deviance matched (in the mail below)
but I didn't understand why AIC is still different.
AIC(m1)
[1] 532965
AIC(m1big_longer)
[1] 101442.9
That's because AIC.default uses the unnormalized loglikelihood and
AIC.biglm uses the deviance. Only differences in AIC between models
are meaningful, not individual values.
Problem-2:
chunksize argument is there in bigglm but not in biglm, consequently,
udate.biglm is there, but not update.bigglm
Is my observation correct? If yes, why is this difference?
Because update.bigglm is impossible.
Fitting a glm requires iteration, which means that it requires
multiple passes through the data. Fitting a linear model requires only
a single pass. update.biglm can take a fitted or partially fitted
biglm and add more data. To do the same thing for a bigglm you would
need to start over again from the beginning of the data set.
To fit a glm, you need to specify a data source that bigglm() can
iterate over. You do this with a function that can be called
repeatedly to return the next chunk of data.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlum...@u.washington.edu University of Washington, Seattle
I don't know why the AIC is different, but remember that there are multiple
definitions for AIC (generally differing in the constant added) and it may just
be a difference in the constant, or it could be that you have not fit the whole
dataset (based on your other question).
For an lm model biglm only needs to make a single pass through the data. This
was the first function written for the package and the update mechanism was an
easy way to write the function (and still works well).
The bigglm function came later and the models other than Gaussian require
multiple passes through the data so instead of the update mechanism that biglm
uses, bigglm requires the data argument to be a function that returns the next
chunk of data and can restart to the beginning of the dataset.
Also note that the bigglm function usually only does a few passes through the
data, usually this is good enough, but in some cases you may need to increase
the number of passes.
Hope this helps,
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.