[R] Modifying the internals of mgcv's G object to program a within estimator

Andrew Crane-Droesch Thu, 15 Aug 2013 10:07:17 -0700

I'm trying to write a function to make gam (in mgcv) give me thefixed-effects "within" estimator, which is equivalent to the OLS dummyvariable estimator. Basically this involves subtracting thewithin-group means from the IVs and the DV, adding their overall meansback in, and fitting the model. For OLS, this is equivalent to runninga regression with a dummy variable for each group. With many groups,this gets computationally prohibitive, especially if you want tobootstrap. Eventually I want to apply this procedure to all of gam'ssplined terms, but for now I'm simply trying to modify gam's G object todo so only for one parametrically-represented variable.

The G object feeds everything to gam needed to fit the model. The codebelow modifies the design matrix, the DV, and the formula. Myquestion: what else needs to change in G to make the FE estimatorequivalent to the dummy variable estimator? Unfortunately I can't findany technical information or well-commented code describing how gamworks in this regard, nor good descriptions of what the constituents ofG are. Nor can I look at the gam.setup function -- I think that it iscompiled in C (or something). Responses that can make the estimate ofthe coefficient on x (below) equal for the two models would be reallywelcome, as would pointers to useful documentation.


rm(list=ls())

library(mgcv)
library(plyr)
x = rexp(100)
ID = ceiling(runif(100)*10)
y = exp(x)+rnorm(100)+ID
t = ceiling(runif(100)*5)
data = data.frame(y,x,ID,t)
data = data[with(data,order(ID,t)),]

m = gam(y~x+as.factor(ID),fit=FALSE)

transform = function(G,ID){
    rep.row = function(x,n){matrix(rep(x,each=n),nrow=n)}
    X = G$X[,grepl(colnames(data.frame(ID=ID)),colnames(G$X))==FALSE]
    dmframe = as.data.frame(cbind(ID,X))

dm <- as.matrix(ddply(dmframe, "ID", colwise(scale, center = TRUE,scale = FALSE)))

    cm = rep.row(colMeans(dmframe[,2:ncol(dmframe)]),nrow(dmframe))
    X = dm[,2:ncol(dm)]+cm
    G$X = X

    y = G$y
    dmframe = as.data.frame(cbind(ID,y))

dm <- ddply(dmframe, "ID", colwise(scale, center = TRUE, scale =FALSE))

    dm = as.matrix(dm)
    cm = mean(dmframe[,2])
    y = dm[,2]+cm
    G$y = y

    spec = as.character(G$formula)[3]; n=nchar(spec)

G$formula =as.formula(paste(as.character(G$formula)[2],as.character(G$formula)[1],substr(spec,start=1,stop=n-15)))

    }

#m = gam(y~s(x,k=7)+as.factor(ID),fit=FALSE,sp=0)
out = transform(m,data$ID)
G=m; ID=data$ID

mfe = gam(out)
summary(mfe)
mlsdv = gam(y~x+as.factor(ID))
#mlsdv = gam(y~s(x,k=7)+as.factor(ID),sp=0)
summary(mlsdv)

#The coefficient on x should be equal for the two models. The interceptshould represent the average of the fixed effects.cbind(as.matrix(t(mlsdv$coef))[,grepl(colnames(data.frame(ID=ID)),colnames(as.matrix(t(mlsdv$coef))))==FALSE],mfe$coef)


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Modifying the internals of mgcv's G object to program a within estimator

Reply via email to