[R] gam (mgcv), multiple imputation, f-stats/p-values, and summary(gam)

Andrew Crane-Droesch Wed, 08 May 2013 07:14:15 -0700

Dear All,

I'm using gam for a project that involves multiple imputation, and ithas led me to a question about how f-statistics/p-values work in gam.Specifically, how do the values in summary(gam) get generated? As ismade clear by the dumb example below, I'm manipul;ating gam objects toreflect the MI procedures that I'm using. I don't trust thef-statistics/p-values that I'm getting, but I'd like to know how tofurther manipulate these objects to get trustworthy values. Part ofthat invovles knowing how the output in summary(gam) gets generated, andfrom what elements of a gam object.


Here is the example:

library(mgcv)
par(mfrow=c(2,2))
impute <- function (a, a.impute){
    ifelse (is.na(a), a.impute, a)
    }

#make some dumb fake data
x1.knots = c(-1,-.5,0,.5,1)
x2.knots = c(-1,-.5,0,.5,1)
K=5
x1 = rnorm(100)
x2 = rnorm(100)
y = .05*exp(x1)-.5*x1 + .1*x2 + x1*x2 + rnorm(100)
#some of it is missing
x1[81:100] = NA
x2[71:90] = NA

#do a few dumb imputations, and fit models to the partially-imputed data
x1.imp = impute(x1, rnorm(100))
x2.imp = impute(x2, rnorm(100))

m1 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots,x2.imp = x2.knots))

plot(m1,plot.type="contour",scheme=2,main="Imp 1")

x1.imp = impute(x1, rnorm(100))
x2.imp = impute(x2, rnorm(100))

m2 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots,x2.imp = x2.knots))

plot(m2,plot.type="contour",scheme=2,main="Imp 2")

x1.imp = impute(x1, rnorm(100))
x2.imp = impute(x2, rnorm(100))

m3 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots,x2.imp = x2.knots))

plot(m3,plot.type="contour",scheme=2,main="Imp 3")

results = list(m1,m2,m3)

#Combine the results according to rubin's rules
reps=3
bhat=results[[1]]$coeff
for (i in 2:reps){
    bhat=bhat+results[[i]]$coeff
    }
bhat = bhat/reps

W=results[[1]]$Vp
for (i in 2:reps){
    W = W+results[[i]]$Vp
    }
W = W/reps
B= (results[[1]]$coeff-bhat) %*% t(results[[1]]$coeff-bhat)
for (i in 2:reps){
    B = B+(results[[i]]$coeff-bhat) %*% t(results[[i]]$coeff-bhat)
    }
B=B/(reps-1)
Vb = W+(1+1/reps)*B

#Put the results into a convenient gam object
MI = results[[1]]
MI$coefficients=bhat
MI$Vp = Vb

#and summarize
summary(MI)
plot(MI,plot.type="contour",scheme=2,main="MI")

When I do something similar with non-fake data I get wacky f-statisticsand p-values. For example, F could be >100 and p could equal 1. Thisprobably has something to do with degrees of freedom.

My easy question: what goes on under the hood with gam to generatethese values? What parts of a gam object are called up?

My harder question: how might one construct principled analogs of thesestatistics in an MI context, when degrees of freedom will vary acrossmodels, depending on the imputed data? Has anybody thought about this,or do I have have some serious pencil-and-paper ahead of me?


Thanks,
Andrew

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] gam (mgcv), multiple imputation, f-stats/p-values, and summary(gam)

Reply via email to