I would use eval(), but I think that most formula-using functions do it more like the following.
getRHSLength <- function (formula, data = parent.frame()) { rhsExpr <- formula[[length(formula)]] rhsValue <- eval(rhsExpr, envir = data, enclos = environment(formula)) length(rhsValue) } * use eval() instead of get() so you will find variables are in ancestral environments of envir (if envir is an environment), not just envir itself. * just evaluate the stuff in the formula using the non-standard evaluation frame, call length() in the current frame. Otherwise, if envir inherits directly from emptyenv() the 'length' function will not be found. * use envir=data so it looks first in the data argument for variables * the enclos argument is used if envir is not an environment and is used to find variables that are not in envir. Here are some examples: > X <- 1:10 > getRHSLength(~X) [1] 10 > getRHSLength(~X, data=data.frame(X=1:2)) [1] 2 > getRHSLength((function(){X <- 1:4; ~X})(), data=data.frame()) [1] 4 > getRHSLength((function(){X <- 1:4; ~X})(), data=data.frame(X=1:2)) [1] 2 > getRHSLength((function(){X <- 1:4; ~X})(), data=list2env(data.frame())) [1] 10 > getRHSLength((function(){X <- 1:4; ~X})(), data=emptyenv()) Error in eval(expr, envir, enclos) : object 'X' not found I think you will see the same lookups if you try analogous things with lm(). Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Oct 17, 2014 at 11:04 AM, Joris Meys <jorism...@gmail.com> wrote: > Dear R gurus, > > I need to know the length of a variable (let's call that X) that is > mentioned in a formula. So obviously I look for the environment from which > the formula is called and then I have two options: > > - using eval(parse(text='length(X)'), > envir=environment(formula) ) > > - using length(get('X'), > envir=environment(formula) ) > > a bit of benchmarking showed that the first option is about 20 times > slower, to that extent that if I repeat it 10,000 times I save more than > half a second. So speed is not really an issue here. > > Personally I'd go for option 2 as that one is easier to read and does the > job nicely, but with these functions I'm always a bit afraid that I'm > overseeing important details or side effects here (possibly memory issues > when working with larger data). > > Anybody an idea what the dangers are of these methods, and which one is the > most robust method? > > Thank you > Joris > > -- > Joris Meys > Statistical consultant > > Ghent University > Faculty of Bioscience Engineering > Department of Mathematical Modelling, Statistics and Bio-Informatics > > tel : +32 9 264 59 87 > joris.m...@ugent.be > ------------------------------- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel