hi Simon yes, I also got the right shape of the mean-variance relation but the wrong estimate of the parameter.
thanks very much Greg > Hi Greg, > > Yes, this sounds right - with quasipoisson gam uses `extended > quasi-likelihood' (see McCullagh and Nelder's GLM book) to allow > estimation of the scale parameter along with the smoothing parameters > via (RE)ML, and it could well be that this gives a biased scale estimate > with low counts (although the shape of the mean variance relationship is > unaffected by this). > > It might well be more sensible to default to a Pearson estimate of the > scale parameter for 'gam' ('bam' and 'gamm' already do this), to avoid > this issue with quasipoisson and low counts. Your email reminded me that > someone had reported rather too high p-values for quasipoisson with > these sort of data, and looking back at my list of open issues I see > that `someone' was you as well! It's possible that moving to a Pearson > estimator of the scale will solve that problem too. > > Thanks for this... very helpful. > > best, > Simon > > > On 05/02/14 12:56, Greg Dropkin wrote: >> thanks Simon >> >> also, it appears at least with ML that the default scale estimate with >> quasipoisson (i.e. using dev) is the scale which minimises the ML value >> of >> the fitted model. So it is the "best" model but doesn't actually give >> the >> correct mean-variance relation. Is that right? >> >> thanks again >> >> Greg >> >>> Greg, >>> >>> The deviance being chi^2 distributed on the residual degrees of freedom >>> works in some cases (mostly where the response itself can be reasonably >>> approximated as Gaussian), but rather poorly in others (noteably low >>> count data). This is what you are seeing in your simulations - in the >>> first case the Poisson mean is relatively high - so the normal >>> approximation to the Poisson is not too bad and the deviance is approx. >>> chi.sq on residual df. In the second case the Poisson mean is low, >>> there >>> are lots of zeroes, and the approximation breaks down. So yes, the >>> Pearson estimator is probably a better bet in the latter case. >>> >>> best, >>> Simom >>> >>> ps. Note that the chi squared approximation for the *difference* in >>> deviance between two nested models does not suffer from this problem. >>> >>> >>> On 04/02/14 09:25, Greg Dropkin wrote: >>>> mgcv: distribution of dev >>>> >>>> hi >>>> >>>> I can't tell if this is a simple error. >>>> I'm puzzled by the distribution of dev when fitting a gam to Poisson >>>> generated data. >>>> I expected dev to be approximately chi-squared on residual d.f., i.e. >>>> about 1000 in each case below. >>>> In particular, the low values in the 3rd and 4th versions would >>>> suggest >>>> scale < 1, yet the data is Poisson generated. >>>> The problem isn't caused by insufficient k values in the smoother. >>>> Does this mean that with sparse data the distribution of dev is no >>>> longer >>>> approx chi-sq on residual df? >>>> Does it mean the scale estimate when fitting quasipoisson should be >>>> the >>>> Pearson version? >>>> >>>> thanks >>>> >>>> greg >>>> >>>> library(mgcv) >>>> set.seed(1) >>>> x1<-runif(1000) >>>> linp<-2+exp(-2*x1)*sin(8*x1) >>>> sum(exp(linp)) >>>> dev1<-dev2<-sums<-rep(0,20) >>>> for (j in 1:20) >>>> { >>>> y<-rpois(1000,exp(linp)) >>>> sums[j]<-sum(y) >>>> m1<-gam(y~s(x1),family="poisson") >>>> m2<-gam(y~s(x1,k=20),family="poisson") >>>> dev1[j]=m1$dev >>>> dev2[j]=m2$dev >>>> } >>>> mean(sums) >>>> sd(sums) >>>> mean(dev1) >>>> sd(dev1) >>>> mean(dev2) >>>> sd(dev2) >>>> >>>> #dev slighly higher than expected >>>> >>>> linpa<--1+exp(-2*x1)*sin(8*x1) >>>> sum(exp(linpa)) >>>> dev1a<-dev2a<-sumsa<-rep(0,20) >>>> for (j in 1:20) >>>> { >>>> y<-rpois(1000,exp(linpa)) >>>> sumsa[j]<-sum(y) >>>> m1<-gam(y~s(x1),family="poisson") >>>> m2<-gam(y~s(x1,k=20),family="poisson") >>>> dev1a[j]=m1$dev >>>> dev2a[j]=m2$dev >>>> } >>>> mean(sumsa) >>>> sd(sumsa) >>>> mean(dev1a) >>>> sd(dev1a) >>>> mean(dev2a) >>>> sd(dev2a) >>>> >>>> #dev a bit lower than expected >>>> >>>> linpb<--1.5+exp(-2*x1)*sin(8*x1) >>>> sum(exp(linpb)) >>>> dev1b<-dev2b<-sumsb<-rep(0,20) >>>> for (j in 1:20) >>>> { >>>> y<-rpois(1000,exp(linpb)) >>>> sumsb[j]<-sum(y) >>>> m1<-gam(y~s(x1),family="poisson") >>>> m2<-gam(y~s(x1,k=20),family="poisson") >>>> dev1b[j]=m1$dev >>>> dev2b[j]=m2$dev >>>> } >>>> mean(sumsb) >>>> sd(sumsb) >>>> mean(dev1b) >>>> sd(dev1b) >>>> mean(dev2b) >>>> sd(dev2b) >>>> >>>> #dev much lower than expected >>>> >>>> m1q<-gam(y~s(x1),family="quasipoisson",scale=-1) >>>> m1q$scale >>>> m1q$dev/(1000-sum(m1q$edf)) >>>> >>>> #scale estimate < 1 >>>> >>>> sum((y-fitted(m1q))^2/fitted(m1q))/(1000-sum(m1q$edf)) >>>> >>>> #Pearson estimate of scale closer, but also < 1 >>>> >>>> >>>> linpc<--2+exp(-2*x1)*sin(8*x1) >>>> sum(exp(linpc)) >>>> dev1c<-dev2c<-sumsc<-rep(0,20) >>>> for (j in 1:20) >>>> { >>>> y<-rpois(1000,exp(linpc)) >>>> sumsc[j]<-sum(y) >>>> m1<-gam(y~s(x1),family="poisson") >>>> m2<-gam(y~s(x1,k=20),family="poisson") >>>> dev1c[j]=m1$dev >>>> dev2c[j]=m2$dev >>>> } >>>> mean(sumsc) >>>> sd(sumsc) >>>> mean(dev1c) >>>> sd(dev1c) >>>> mean(dev2c) >>>> sd(dev2c) >>>> >>>> #dev much lower than expected >>>> >>>> m1q<-gam(y~s(x1),family="quasipoisson",scale=-1) >>>> m1q$scale >>>> m1q$sig2 >>>> m1q$dev/(1000-sum(m1q$edf)) >>>> >>>> #scale estimate < 1 >>>> >>>> sum((y-fitted(m1q))^2/fitted(m1q))/(1000-sum(m1q$edf)) >>>> >>>> #Pearson estimate of scale ok >>>> >>>> >>> >>> >>> -- >>> Simon Wood, Mathematical Science, University of Bath BA2 7AY UK >>> +44 (0)1225 386603 http://people.bath.ac.uk/sw283 >>> >>> >> >> > > > -- > Simon Wood, Mathematical Science, University of Bath BA2 7AY UK > +44 (0)1225 386603 http://people.bath.ac.uk/sw283 > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.