Dear All,

I'm using gam for a project that involves multiple imputation, and it has led me to a question about how f-statistics/p-values work in gam. Specifically, how do the values in summary(gam) get generated? As is made clear by the dumb example below, I'm manipul;ating gam objects to reflect the MI procedures that I'm using. I don't trust the f-statistics/p-values that I'm getting, but I'd like to know how to further manipulate these objects to get trustworthy values. Part of that invovles knowing how the output in summary(gam) gets generated, and from what elements of a gam object.

Here is the example:

library(mgcv)
par(mfrow=c(2,2))
impute <- function (a, a.impute){
    ifelse (is.na(a), a.impute, a)
    }

#make some dumb fake data
x1.knots = c(-1,-.5,0,.5,1)
x2.knots = c(-1,-.5,0,.5,1)
K=5
x1 = rnorm(100)
x2 = rnorm(100)
y = .05*exp(x1)-.5*x1 + .1*x2 + x1*x2 + rnorm(100)
#some of it is missing
x1[81:100] = NA
x2[71:90] = NA

#do a few dumb imputations, and fit models to the partially-imputed data
x1.imp = impute(x1, rnorm(100))
x2.imp = impute(x2, rnorm(100))
m1 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots, x2.imp = x2.knots))
plot(m1,plot.type="contour",scheme=2,main="Imp 1")

x1.imp = impute(x1, rnorm(100))
x2.imp = impute(x2, rnorm(100))
m2 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots, x2.imp = x2.knots))
plot(m2,plot.type="contour",scheme=2,main="Imp 2")

x1.imp = impute(x1, rnorm(100))
x2.imp = impute(x2, rnorm(100))
m3 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots, x2.imp = x2.knots))
plot(m3,plot.type="contour",scheme=2,main="Imp 3")

results = list(m1,m2,m3)

#Combine the results according to rubin's rules
reps=3
bhat=results[[1]]$coeff
for (i in 2:reps){
    bhat=bhat+results[[i]]$coeff
    }
bhat = bhat/reps

W=results[[1]]$Vp
for (i in 2:reps){
    W = W+results[[i]]$Vp
    }
W = W/reps
B= (results[[1]]$coeff-bhat) %*% t(results[[1]]$coeff-bhat)
for (i in 2:reps){
    B = B+(results[[i]]$coeff-bhat) %*% t(results[[i]]$coeff-bhat)
    }
B=B/(reps-1)
Vb = W+(1+1/reps)*B

#Put the results into a convenient gam object
MI = results[[1]]
MI$coefficients=bhat
MI$Vp = Vb

#and summarize
summary(MI)
plot(MI,plot.type="contour",scheme=2,main="MI")

When I do something similar with non-fake data I get wacky f-statistics and p-values. For example, F could be >100 and p could equal 1. This probably has something to do with degrees of freedom.

My easy question: what goes on under the hood with gam to generate these values? What parts of a gam object are called up?

My harder question: how might one construct principled analogs of these statistics in an MI context, when degrees of freedom will vary across models, depending on the imputed data? Has anybody thought about this, or do I have have some serious pencil-and-paper ahead of me?

Thanks,
Andrew

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to