On Aug 14, 2013, at 11:08 AM, Andrew Crane-Droesch wrote: > Hello, > > I am trying to make a slight modification to the way gam (mgcv) works. > I want to modify the G$X matrix, which is the design matrix, to > accommodate an estimator that I am trying to program. I am working with > panel data, and I want to take all continuous variables, including basis > functions evaluated at covariate variables, subtract their groupwise > means, and then add in their overall mean. (This is an alternative to > having a dummy variable for every single group, and I'm working in a > context where random effects aren't appropriate). I'd then do the same > thing with G$y. Afterwards I want to pass the modified G$X and G$Y > matrices to estimate.gam, which takes G as an argument. > > As far as I can tell, G$X is organized to have the parametric terms > (including factors) at the front with column names, and nonparametric > terms at the end without column names. So identifying which terms to > apply the transformation to should be straightforward. > > I've had some luck tweaking the internals of mgcv before, changing the > default plotting behavior of vis.gam. But I am having trouble doing so > with gam itself. To test things out, I did the following: > > * type "gam" into the terminal, to get the printout of the function's > code. > * add a line below line 70: G$X = G$X^2 (just to see if I can make an > arbitrary transformation) > * change the function name from gam to mod.gam, run it, and then run > mod.gam on an arbitrary dataset. > > When I do so, I get the following error: > Error in mod.gam(y ~ s(x)) : could not find function "variable.summary" > > It turns out that variable.summary, along with gam.setup, estimate.gam, > and perhaps other constituents of gam do NOT have help pages, nor does > typing them into the terminal reveal their code. Am I encountering > functions compiled from some non-R language ©?
You are encountering functions that are not exported from mgcv. So the environment where the interpreter looks for them is not the same as when `gam` is called. You could get them to "work" by prepending `mgcv::` to each instance of the "not-found" functions. You might also try just this: environment(mod.gam) <- environment(gam) I seem to get success with that strategy but I am not qualified to assess your coding chnages > environment(mod.gam) <- environment(gam) > modm = mod.gam(y~s(x)) > > modm Family: gaussian Link function: identity Formula: y ~ s(x) Estimated degrees of freedom: 8.2 total = 9.2 GCV score: 1.375422 -- David. > gam.setup, estimate.gam > > Regardless, I would greatly appreciate any help with my problem of > transforming the model matrix before fitting. I'd very much prefer to > use mgcv, with all of the work that has obviously gone into it, rather > than reinventing the wheel. > > My code for reproducing the error is below. > > Thanks in advance for any help. > > Best, > Andrew > > library(mgcv) > gam > mod.gam = function (formula, family = gaussian(), data = list(), weights > = NULL, > subset = NULL, na.action, offset = NULL, method = "GCV.Cp", > optimizer = c("outer", "newton"), control = list(), scale = 0, > select = FALSE, knots = NULL, sp = NULL, min.sp = NULL, H = NULL, > gamma = 1, fit = TRUE, paraPen = NULL, G = NULL, in.out = NULL, > ...) > { > control <- do.call("gam.control", control) > if (is.null(G)) { > gp <- interpret.gam(formula) > cl <- match.call() > mf <- match.call(expand.dots = FALSE) > mf$formula <- gp$fake.formula > mf$family <- mf$control <- mf$scale <- mf$knots <- mf$sp <- > mf$min.sp <- mf$H <- mf$select <- mf$gamma <- mf$method <- mf$fit <- > mf$paraPen <- mf$G <- mf$optimizer <- mf$in.out <- mf$... <- NULL > mf$drop.unused.levels <- TRUE > mf[[1]] <- as.name("model.frame") > pmf <- mf > mf <- eval(mf, parent.frame()) > if (nrow(mf) < 2) > stop("Not enough (non-NA) data to do anything meaningful") > terms <- attr(mf, "terms") > vars <- all.vars(gp$fake.formula[-2]) > inp <- parse(text = paste("list(", paste(vars, collapse = ","), > ")")) > if (!is.list(data) && !is.data.frame(data)) > data <- as.data.frame(data) > dl <- eval(inp, data, parent.frame()) > names(dl) <- vars > var.summary <- variable.summary(gp$pf, dl, nrow(mf)) > rm(dl) > pmf$formula <- gp$pf > pmf <- eval(pmf, parent.frame()) > pterms <- attr(pmf, "terms") > if (is.character(family)) > family <- eval(parse(text = family)) > if (is.function(family)) > family <- family() > if (is.null(family$family)) > stop("family not recognized") > if (family$family[1] == "gaussian" && family$link == > "identity") > am <- TRUE > else am <- FALSE > if (!control$keepData) > rm(data) > G <- gam.setup(gp, pterms = pterms, data = mf, knots = knots, > sp = sp, min.sp = min.sp, H = H, absorb.cons = TRUE, > sparse.cons = 0, select = select, idLinksBases = > control$idLinksBases, > scale.penalty = control$scalePenalty, paraPen = paraPen) > G$var.summary <- var.summary > G$family <- family > if (ncol(G$X) > nrow(G$X) + nrow(G$C)) > stop("Model has more coefficients than data") > G$terms <- terms > G$pterms <- pterms > G$mf <- mf > G$cl <- cl > G$am <- am > if (is.null(G$offset)) > G$offset <- rep(0, G$n) > G$min.edf <- G$nsdf - dim(G$C)[1] > if (G$m) > for (i in 1:G$m) G$min.edf <- G$min.edf + > G$smooth[[i]]$null.space.dim > G$formula <- formula > environment(G$formula) <- environment(formula) > } > if (!fit) > return(G) > G$conv.tol <- control$mgcv.tol > G$max.half <- control$mgcv.half > G$X = G$X^2 > object <- estimate.gam(G, method, optimizer, control, in.out, > scale, gamma, ...) > if (!is.null(G$L)) { > object$full.sp <- as.numeric(exp(G$L %*% log(object$sp) + > G$lsp0)) > names(object$full.sp) <- names(G$lsp0) > } > names(object$sp) <- names(G$sp) > object$paraPen <- G$pP > object$formula <- G$formula > object$var.summary <- G$var.summary > object$cmX <- G$cmX > object$model <- G$mf > object$na.action <- attr(G$mf, "na.action") > object$control <- control > object$terms <- G$terms > object$pterms <- G$pterms > object$assign <- G$assign > object$contrasts <- G$contrasts > object$xlevels <- G$xlevels > object$offset <- G$offset > if (!is.null(G$Xcentre)) > object$Xcentre <- G$Xcentre > if (control$keepData) > object$data <- data > object$df.residual <- nrow(G$X) - sum(object$edf) > object$min.edf <- G$min.edf > if (G$am && !(method %in% c("REML", "ML", "P-ML", "P-REML"))) > object$optimizer <- "magic" > else object$optimizer <- optimizer > object$call <- G$cl > class(object) <- c("gam", "glm", "lm") > object > } > x = rnorm(100) > y = exp(y)+rnorm(100) > m = gam(y~s(x)) > modm = mod.gam(y~s(x)) > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.