>>>>> "GS" == Gavin Simpson <[EMAIL PROTECTED]> >>>>> on Tue, 16 Aug 2005 18:44:23 +0100 writes:
GS> On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck GS> wrote: >> On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> >> wrote: > On Tue, 2005-08-16 at 11:25 -0400, Gabor >> Grothendieck wrote: > > It can handle data frames like >> this: >> > > >> > > model.frame(y1) > > or > > model.frame(~., y1) >> > >> > Thanks Gabor, >> > >> > Yes, I know that works, but I want the function >> coca.formula to accept a > formula like this y2 ~ y1, >> with both y1 and y2 being data frames. It is >> >> The expressions I gave work generally (i.e. lm, glm, >> ...), not just in model.matrix, so would it be ok if the >> user just does this? >> >> yourfunction(y2 ~., y1) GS> Thanks again Gabor for your comments, GS> I'd prefer the y1 ~ y2 as data frames - as this is the GS> most natural way of doing things. I'd like to have (y2 GS> ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also GS> work - silently without any trouble. I'm sorry, Gavin, I tend to disagree quite a bit. The formula notation has quite a history in the S language, and AFAIK never was the idea to use data.frames as formula components, but rather as "environments" in which formula components are looked up --- exactly as Gabor has explained. To break with such a deeply rooted principle, you should have very very good reasons, because you're breaking the concepts on which all other uses of formulae are based. And this would potentially lead to much confusion of your users, at least in the way they should learn to think about what formulae mean. Martin >> If it really is important to do it the way you describe, >> are the data frames necessarily numeric? If so you could >> preprocess your formula by placing as.matrix around all >> the variables representing data frames using something >> like this: >> >> https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html GS> Yes, they are numeric matrices (as data frames). I've GS> looked at this, but I'd prefer to not have to do too GS> much messing with the formula. >> Of course, if they are necessarily numeric maybe they can >> be matrices in the first place? GS> Because read.table etc. produce data.frames and this is GS> the natural way to work with data in R. but it is also slightly inefficient if they are numeric. There are places for data frames and for matrices. Why should it be a problem to use M <- as.matrix(read.table(..)) ? For large files, it could be quite a bit more efficient, needing a bit more of code, to use scan() to read the numeric data directly : h1 <- scan(..., n=1) ## <read variable names> nc <- length(h1) a <- matrix(scan(...., what = numeric(), ...), ncol = nc, dimnames = list(NULL, h1)) maybe this would be useful to be packaged into a small utility with usage read.matrix(..., type = numeric(), ...) GS> Following your suggestions, I altered my code to GS> evaluate the rhs of the formula and check if it was of GS> class "data.frame". If it is then I stop processing and GS> return it as a data.frame as this point. If not, it GS> eventually gets passed on to model.frame() for it to GS> deal with it. GS> So far - limited testing - it seems to do what I wanted GS> all along. I'm sure there's a gotcha in there somewhere GS> but at least the code runs so I can check for problems GS> against my examples. GS> Right, back to writing documentation... GS> G >> > more intuitive, to my mind at least for this particular >> example and > analysis, to specify the formula with a >> data frame on the rhs. >> > >> > model.frame doesn't work with the formula "~ y1" if the >> object y1, in > the environment when model.frame >> evaluates the formula, is a data.frame. > It works if y1 >> is a matrix, however. I'd like to work around this > >> problem, say by creating an environment in which y1 is >> modified to be a > matrix, if possible. Can this be done? >> > >> > At the moment I have something working by grabbing the >> bits of the > formula and then using get() to grab the >> named object. Of course, this > won't work if someone >> wants to use R's formula interface with the > following >> formula y2 ~ var1 + var2 + var3, data = y1, or to use the >> > subset argument common to many formula >> implementations. I'd like to have > the function work in >> as general a manner as possible, so I'm fishing > around >> for potential solutions. >> > >> > All the best, >> > >> > Gav >> > >> > > >> > > On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> >> wrote: > > > Hi I'm having a problem with model.frame, >> encapsulated in this example: >> > > > >> > > > y1 <- >> matrix(c(3,1,0,1,0,1,1,0,0,0,1,0,0,0,1,1,0,1,1,1), > > > >> nrow = 5, byrow = TRUE) > > > y1 <- as.data.frame(y1) > > >> > rownames(y1) <- paste("site", 1:5, sep = "") > > > >> colnames(y1) <- paste("spp", 1:4, sep = "") > > > y1 >> > > > >> > > > model.frame(~ y1) > > > Error in >> model.frame(formula, rownames, variables, varnames, >> extras, extranames, : > > > invalid variable type >> > > > >> > > > temp <- as.matrix(y1) > > > model.frame(~ temp) > > >> > temp.spp1 temp.spp2 temp.spp3 temp.spp4 > > > 1 3 1 0 1 >> > > > 2 0 1 1 0 > > > 3 0 0 1 0 > > > 4 0 0 1 1 > > > 5 0 >> 1 1 1 >> > > > >> > > > Ideally the above wouldn't have names like >> temp.var1, temp.var2, but one > > > could deal with that >> later. >> > > > >> > > > I have tracked down the source of the error message >> to line 1330 in > > > model.c - here I'm stumped as I >> don't know any C, but it looks as if the > > > code is >> looping over the variables in the formula and checking of >> they > > > are the right "type". So a matrix of variables >> gets through, but a > > > data.frame doesn't. >> > > > >> > > > It would be good if model.frame could cope with >> data.frames in formulae, > > > but seeing as I am >> incapable of providing a patch, is there a way around > > >> > this problem? >> > > > >> > > > Below is the head of the function I am currently >> using, including the > > > function for parsing the >> formula - borrowed and hacked from > > > >> ordiParseFormula() in package vegan. >> > > > >> > > > I can work out the class of the rhs of the >> forumla. Is there a way to > > > create a suitable >> environment for the data argument of parseFormula() > > > >> such that it contains the rhs dataframe coerced to a >> matrix, which then > > > should get through >> model.frame.default without error? How would I go > > > >> about manipulating/creating such an environment? Any >> other ideas? >> > > > >> > > > Thanks in advance >> > > > >> > > > Gav >> > > > >> > > > coca.formula <- function(formula, method = >> c("predictive", "symmetric"), > > > reg.method = >> c("simpls", "eigen"), weights = NULL, > > > n.axes = >> NULL, symmetric = FALSE, data) > > > { > > > parseFormula >> <- function (formula, data) > > > { > > > browser() > > > >> Terms <- terms(formula, "Condition", data = data) > > > >> flapart <- fla <- formula <- formula(Terms, width.cutoff >> = 500) > > > specdata <- formula[[2]] > > > X <- >> eval(specdata, data, parent.frame()) > > > X <- >> as.matrix(X) > > > formula[[2]] <- NULL > > > if >> (formula[[2]] == "1" || formula[[2]] == "0") > > > Y <- >> NULL > > > else { > > > mf <- model.frame(formula, data, >> na.action = na.fail) > > > Y <- model.matrix(formula, mf) >> > > > if (any(colnames(Y) == "(Intercept)")) { > > > xint >> <- which(colnames(Y) == "(Intercept)") > > > Y <- Y[, >> -xint, drop = FALSE] > > > } > > > } > > > list(X = X, Y >> = Y) > > > } > > > if (missing(data)) > > > data <- >> parent.frame() > > > #browser() > > > dat <- >> parseFormula(formula, data) >> > > > >> > > > -- >> > > > >> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% >> > > > Gavin Simpson [T] +44 (0)20 7679 5522 > > > ENSIS >> Research Fellow [F] +44 (0)20 7679 7565 > > > ENSIS >> Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk > > > UCL >> Department of Geography [W] >> http://www.ucl.ac.uk/~ucfagls/cv/ > > > 26 Bedford Way >> [W] http://www.ucl.ac.uk/~ucfagls/ > > > London. WC1H >> 0AP. > > > >> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% >> > > > >> > > > ______________________________________________ > > >> > R-devel@r-project.org mailing list > > > >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > > >> > -- >> > >> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% >> > Gavin Simpson [T] +44 (0)20 7679 5522 > ENSIS Research >> Fellow [F] +44 (0)20 7679 7565 > ENSIS Ltd. & ECRC [E] >> gavin.simpsonATNOSPAMucl.ac.uk > UCL Department of >> Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ > 26 >> Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/ > London. >> WC1H 0AP. > >> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% >> > >> > >> > GS> -- GS> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% GS> Gavin Simpson [T] +44 (0)20 7679 5522 ENSIS Research GS> Fellow [F] +44 (0)20 7679 7565 ENSIS Ltd. & ECRC [E] GS> gavin.simpsonATNOSPAMucl.ac.uk UCL Department of GS> Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ 26 GS> Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/ London. GS> WC1H 0AP. GS> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% GS> ______________________________________________ GS> R-devel@r-project.org mailing list GS> https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel