Hi everyone, This is about documentation for the model.frame() page. The get_all_vars() function (added in R 2.5.0) is a great addition, but the behavior of its '...' argument is different from that of model.frame() with which it is documented and this creates ambiguity. The current docs read:
\item{\dots}{further arguments such as \code{data}, \code{na.action}, \code{subset}. Any additional arguments such as \code{offset} and \code{weights} which reach the default method are used to create further columns in the model frame, with parenthesised names such as \code{"(offset)"}.} This is only true for model.frame() methods but not get_all_vars(). For get_all_vars(), arguments passed to '...' are only ever treated as variables to add to the data frame. See for example: > str(model.frame(mpg ~ wt, data = mtcars, subset = am == 1), give.attr = FALSE) 'data.frame': 13 obs. of 2 variables: $ mpg: num 21 21 22.8 32.4 30.4 33.9 27.3 26 30.4 15.8 ... $ wt : num 2.62 2.88 2.32 2.2 1.61 ... > str(get_all_vars(mpg ~ wt, data = mtcars, subset = am == 1), give.attr = > FALSE) 'data.frame': 32 obs. of 3 variables: $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ... $ wt : num 2.62 2.88 2.32 3.21 3.44 ... $ subset: logi TRUE TRUE TRUE FALSE FALSE FALSE ... The behavior of '...' thus differs in the two functions. This is (I think) a quite problematic ambiguity and one that might best be resolved by adding data, na.action, and subset as formal arguments to all current model.frame() methods and the generic to resolve the ambiguity of '...' (with docs updated accordingly), but that would require a more thorough patch and testing. In lieu of that, a simple documentation change could at least describe the current behavior more accurately: \item{\dots}{for \code{get_all_vars}, further named columns to include in the model frame. For \code{model.frame} methods, a mix of further arguments such as \code{data}, \code{na.action}, \code{subset} to pass to the default method. Any additional arguments (such as \code{offset} and \code{weights} or other named arguments) which reach the default method are used to create further columns in the model frame, with parenthesised names such as \code{"(offset)"}.} That at least describes what is currently happening. Relatedly, it may be worth noting that additional variables passed via '...' to get_all_vars() are subject to vector recycling whereas those passed to model.frame.default() are not: > str(get_all_vars(mpg ~ wt, data = mtcars, new = 2), give.attr = FALSE) 'data.frame': 32 obs. of 3 variables: $ mpg: num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ... $ wt : num 2.62 2.88 2.32 3.21 3.44 ... $ new: num 2 2 2 2 2 2 2 2 2 2 ... > str(get_all_vars(mpg ~ wt, data = mtcars, new = rep(2, nrow(mtcars))), > give.attr = FALSE) 'data.frame': 32 obs. of 3 variables: $ mpg: num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ... $ wt : num 2.62 2.88 2.32 3.21 3.44 ... $ new: num 2 2 2 2 2 2 2 2 2 2 ... > str(model.frame.default(mpg ~ wt, data = mtcars, new = rep(2, nrow(mtcars))), > give.attr = FALSE) 'data.frame': 32 obs. of 3 variables: $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ... $ wt : num 2.62 2.88 2.32 3.21 3.44 ... $ (new): num 2 2 2 2 2 2 2 2 2 2 ... > str(model.frame.default(mpg ~ wt, data = mtcars, new = 2), give.attr = FALSE) Error in model.frame.default(mpg ~ wt, data = mtcars, new = 2) : variable lengths differ (found for '(new)') But, maybe that's something for the "Details" section? (Or it's a bug - I don't really know.) Thanks in advance for your consideration. Best, -Thomas Thomas J. Leeper http://www.thomasleeper.com ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel