I'm trying to write a function to make gam (in mgcv) give me the fixed-effects "within" estimator, which is equivalent to the OLS dummy variable estimator. Basically this involves subtracting the within-group means from the IVs and the DV, adding their overall means back in, and fitting the model. For OLS, this is equivalent to running a regression with a dummy variable for each group. With many groups, this gets computationally prohibitive, especially if you want to bootstrap. Eventually I want to apply this procedure to all of gam's splined terms, but for now I'm simply trying to modify gam's G object to do so only for one parametrically-represented variable.

The G object feeds everything to gam needed to fit the model. The code below modifies the design matrix, the DV, and the formula. My question: what else needs to change in G to make the FE estimator equivalent to the dummy variable estimator? Unfortunately I can't find any technical information or well-commented code describing how gam works in this regard, nor good descriptions of what the constituents of G are. Nor can I look at the gam.setup function -- I think that it is compiled in C (or something). Responses that can make the estimate of the coefficient on x (below) equal for the two models would be really welcome, as would pointers to useful documentation.

rm(list=ls())

library(mgcv)
library(plyr)
x = rexp(100)
ID = ceiling(runif(100)*10)
y = exp(x)+rnorm(100)+ID
t = ceiling(runif(100)*5)
data = data.frame(y,x,ID,t)
data = data[with(data,order(ID,t)),]

m = gam(y~x+as.factor(ID),fit=FALSE)

transform = function(G,ID){
    rep.row = function(x,n){matrix(rep(x,each=n),nrow=n)}
    X = G$X[,grepl(colnames(data.frame(ID=ID)),colnames(G$X))==FALSE]
    dmframe = as.data.frame(cbind(ID,X))
dm <- as.matrix(ddply(dmframe, "ID", colwise(scale, center = TRUE, scale = FALSE)))
    cm = rep.row(colMeans(dmframe[,2:ncol(dmframe)]),nrow(dmframe))
    X = dm[,2:ncol(dm)]+cm
    G$X = X

    y = G$y
    dmframe = as.data.frame(cbind(ID,y))
dm <- ddply(dmframe, "ID", colwise(scale, center = TRUE, scale = FALSE))
    dm = as.matrix(dm)
    cm = mean(dmframe[,2])
    y = dm[,2]+cm
    G$y = y

    spec = as.character(G$formula)[3]; n=nchar(spec)
G$formula = as.formula(paste(as.character(G$formula)[2],as.character(G$formula)[1],substr(spec,start=1,stop=n-15)))
    }

#m = gam(y~s(x,k=7)+as.factor(ID),fit=FALSE,sp=0)
out = transform(m,data$ID)
G=m; ID=data$ID

mfe = gam(out)
summary(mfe)
mlsdv = gam(y~x+as.factor(ID))
#mlsdv = gam(y~s(x,k=7)+as.factor(ID),sp=0)
summary(mlsdv)

#The coefficient on x should be equal for the two models. The intercept should represent the average of the fixed effects. cbind(as.matrix(t(mlsdv$coef))[,grepl(colnames(data.frame(ID=ID)),colnames(as.matrix(t(mlsdv$coef))))==FALSE], mfe$coef)

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to