What is a reliable way to go from a column of a model matrix back to the column (or columns) of the original data source used to make the model matrix? I can come up with a method that seems to work, but I don't see guarantees in the documentation that it will.
In particular, does the order of the term.labels match the order of columns for factors in a terms object? The documentation says the model.matrix assign attribute uses the ordering of terms.labels. If anyone can tell me if this approach is reliable, or of one that is, I would appreciate it. Ross Boylan Proposed function and a little example follow. # return a vector v such that data[,v[i]] contributed to mm[,i] # mm = model matrix produced by # form = formula # data = data reverse.map <- function(mm, form, data){ tt <- terms(form, data=data) ttf <- attr(tt, "factors") mmi <- attr(mm, "assign") # this depends on assign using same order as columns of factors # entries in mmi that are 0 (the intercept) are silently dropped ttf2 <- ttf[,mmi] # take the first row that contributes r <- apply(ttf2, 2, function(is) rownames(ttf)[is > 0][1]) match(r, colnames(data)) } > ### experiment with mapping model matrix to original columns > df <- sp2b[sample(nrow(sp2b), 8), c("pEthnic", "ethnic_sg", "rac_gay")] > form <- ~pEthnic+ethnic_sg*rac_gay > mm <- model.matrix(form, df) > tt <- terms(form, data=df) > ttf <- attr(tt, "factors") > mmi <- attr(mm, "assign") > df pEthnic ethnic_sg rac_gay 1366 Afr Amer Afr Amer 3.25 3052 Afr Amer Afr Amer 1.75 3012 Latino Afr Amer 2.00 369 Afr Amer Asian/PI 2.00 529 White Asian/PI 2.00 194 Asian/PI Asian/PI 3.25 126 White Asian/PI 2.25 2147 Latino Latino 2.75 > colnames(mm) [1] "(Intercept)" "pEthnicAsian/PI" [3] "pEthnicLatino" "pEthnicOther" [5] "pEthnicWhite" "ethnic_sgAsian/PI" [7] "ethnic_sgLatino" "rac_gay" [9] "ethnic_sgAsian/PI:rac_gay" "ethnic_sgLatino:rac_gay" > ttf # term "factors" pEthnic ethnic_sg rac_gay ethnic_sg:rac_gay pEthnic 1 0 0 0 ethnic_sg 0 1 0 1 rac_gay 0 0 1 1 > mmi #model matrix "assign" [1] 0 1 1 1 1 2 2 3 4 4 > reverse.map(mm, form, df) [1] 1 1 1 1 2 2 3 2 2 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.