mgcv: distribution of dev hi
I can't tell if this is a simple error. I'm puzzled by the distribution of dev when fitting a gam to Poisson generated data. I expected dev to be approximately chi-squared on residual d.f., i.e. about 1000 in each case below. In particular, the low values in the 3rd and 4th versions would suggest scale < 1, yet the data is Poisson generated. The problem isn't caused by insufficient k values in the smoother. Does this mean that with sparse data the distribution of dev is no longer approx chi-sq on residual df? Does it mean the scale estimate when fitting quasipoisson should be the Pearson version? thanks greg library(mgcv) set.seed(1) x1<-runif(1000) linp<-2+exp(-2*x1)*sin(8*x1) sum(exp(linp)) dev1<-dev2<-sums<-rep(0,20) for (j in 1:20) { y<-rpois(1000,exp(linp)) sums[j]<-sum(y) m1<-gam(y~s(x1),family="poisson") m2<-gam(y~s(x1,k=20),family="poisson") dev1[j]=m1$dev dev2[j]=m2$dev } mean(sums) sd(sums) mean(dev1) sd(dev1) mean(dev2) sd(dev2) #dev slighly higher than expected linpa<--1+exp(-2*x1)*sin(8*x1) sum(exp(linpa)) dev1a<-dev2a<-sumsa<-rep(0,20) for (j in 1:20) { y<-rpois(1000,exp(linpa)) sumsa[j]<-sum(y) m1<-gam(y~s(x1),family="poisson") m2<-gam(y~s(x1,k=20),family="poisson") dev1a[j]=m1$dev dev2a[j]=m2$dev } mean(sumsa) sd(sumsa) mean(dev1a) sd(dev1a) mean(dev2a) sd(dev2a) #dev a bit lower than expected linpb<--1.5+exp(-2*x1)*sin(8*x1) sum(exp(linpb)) dev1b<-dev2b<-sumsb<-rep(0,20) for (j in 1:20) { y<-rpois(1000,exp(linpb)) sumsb[j]<-sum(y) m1<-gam(y~s(x1),family="poisson") m2<-gam(y~s(x1,k=20),family="poisson") dev1b[j]=m1$dev dev2b[j]=m2$dev } mean(sumsb) sd(sumsb) mean(dev1b) sd(dev1b) mean(dev2b) sd(dev2b) #dev much lower than expected m1q<-gam(y~s(x1),family="quasipoisson",scale=-1) m1q$scale m1q$dev/(1000-sum(m1q$edf)) #scale estimate < 1 sum((y-fitted(m1q))^2/fitted(m1q))/(1000-sum(m1q$edf)) #Pearson estimate of scale closer, but also < 1 linpc<--2+exp(-2*x1)*sin(8*x1) sum(exp(linpc)) dev1c<-dev2c<-sumsc<-rep(0,20) for (j in 1:20) { y<-rpois(1000,exp(linpc)) sumsc[j]<-sum(y) m1<-gam(y~s(x1),family="poisson") m2<-gam(y~s(x1,k=20),family="poisson") dev1c[j]=m1$dev dev2c[j]=m2$dev } mean(sumsc) sd(sumsc) mean(dev1c) sd(dev1c) mean(dev2c) sd(dev2c) #dev much lower than expected m1q<-gam(y~s(x1),family="quasipoisson",scale=-1) m1q$scale m1q$sig2 m1q$dev/(1000-sum(m1q$edf)) #scale estimate < 1 sum((y-fitted(m1q))^2/fitted(m1q))/(1000-sum(m1q$edf)) #Pearson estimate of scale ok ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.