[Rd] Documentation of model.frame() and get_all_vars()

2017-03-26 Thread Thomas J. Leeper
Hi everyone,

This is about documentation for the model.frame() page. The
get_all_vars() function (added in R 2.5.0) is a great addition, but
the behavior of its '...' argument is different from that of
model.frame() with which it is documented and this creates ambiguity.
The current docs read:

\item{\dots}{further arguments such as \code{data}, \code{na.action},
\code{subset}. Any additional arguments such as \code{offset} and
\code{weights} which reach the default method are used to create
further columns in the model frame, with parenthesised names such as
\code{"(offset)"}.}

This is only true for model.frame() methods but not get_all_vars().
For get_all_vars(), arguments passed to '...' are only ever treated as
variables to add to the data frame. See for example:

> str(model.frame(mpg ~ wt, data = mtcars, subset = am == 1), give.attr = FALSE)
'data.frame':   13 obs. of  2 variables:
 $ mpg: num  21 21 22.8 32.4 30.4 33.9 27.3 26 30.4 15.8 ...
 $ wt : num  2.62 2.88 2.32 2.2 1.61 ...

> str(get_all_vars(mpg ~ wt, data = mtcars, subset = am == 1), give.attr = 
> FALSE)
'data.frame':   32 obs. of  3 variables:
 $ mpg   : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ wt: num  2.62 2.88 2.32 3.21 3.44 ...
 $ subset: logi  TRUE TRUE TRUE FALSE FALSE FALSE ...

The behavior of '...' thus differs in the two functions. This is (I
think) a quite problematic ambiguity and one that might best be
resolved by adding data, na.action, and subset as formal arguments to
all current model.frame() methods and the generic to resolve the
ambiguity of '...' (with docs updated accordingly), but that would
require a more thorough patch and testing.

In lieu of that, a simple documentation change could at least describe
the current behavior more accurately:

\item{\dots}{for \code{get_all_vars}, further named columns to include
in the model frame. For \code{model.frame} methods, a mix of further
arguments such as \code{data}, \code{na.action}, \code{subset} to pass
to the default method. Any additional arguments (such as \code{offset}
and \code{weights} or other named arguments) which reach the default
method are used to create further columns in the model frame, with
parenthesised names such as \code{"(offset)"}.}

That at least describes what is currently happening.

Relatedly, it may be worth noting that additional variables passed via
'...' to get_all_vars() are subject to vector recycling whereas those
passed to model.frame.default() are not:

> str(get_all_vars(mpg ~ wt, data = mtcars, new = 2), give.attr = FALSE)
'data.frame':   32 obs. of  3 variables:
 $ mpg: num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ wt : num  2.62 2.88 2.32 3.21 3.44 ...
 $ new: num  2 2 2 2 2 2 2 2 2 2 ...

> str(get_all_vars(mpg ~ wt, data = mtcars, new = rep(2, nrow(mtcars))), 
> give.attr = FALSE)
'data.frame':   32 obs. of  3 variables:
 $ mpg: num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ wt : num  2.62 2.88 2.32 3.21 3.44 ...
 $ new: num  2 2 2 2 2 2 2 2 2 2 ...

> str(model.frame.default(mpg ~ wt, data = mtcars, new = rep(2, nrow(mtcars))), 
> give.attr = FALSE)
'data.frame':   32 obs. of  3 variables:
 $ mpg  : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ wt   : num  2.62 2.88 2.32 3.21 3.44 ...
 $ (new): num  2 2 2 2 2 2 2 2 2 2 ...

> str(model.frame.default(mpg ~ wt, data = mtcars, new = 2), give.attr = FALSE)
Error in model.frame.default(mpg ~ wt, data = mtcars, new = 2) :
  variable lengths differ (found for '(new)')

But, maybe that's something for the "Details" section? (Or it's a bug
- I don't really know.)

Thanks in advance for your consideration.

Best,
-Thomas

Thomas J. Leeper
http://www.thomasleeper.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] A trap for young players with the lapply() function.

2017-03-26 Thread Rolf Turner


From time to time I get myself into a state of bewilderment when using
apply() by calling it with FUN equal to a function which has an 
"optional" argument named "X".


E.g.

xxx <- lapply(y,function(x,X){cos(x*X)},X=2*pi)

which produces the error message


Error in get(as.character(FUN), mode = "function", envir = envir) :
  object 'y' of mode 'function' was not found


This of course happens because the name of the first argument of 
lapply() is "X" and so it takes the value of this first argument to be 
the supplied X (2*pi in the foregoing example) and then expects what the 
user has denoted by "y" to be the value of FUN, and (obviously!) it isn't.


Once one realises what is going on, it's all quite obvious, and usually
pretty easy to fix.  OTOH there are lots of functions around with second
or third arguments whose formal name is "X", and these can trip one up
until the penny drops.

This keeps happening to me, over and over again (with sufficiently long
intervals between occurrences so that my ageing memory forgets the 
previous occurrence).


Is there any way to trap/detect the use of an optional argument called 
"X" and thereby issue a more perspicuous error message?


This would be helpful to those users who, like myself, are bears of very 
little brain.


Failing that (it does look impossible) might it not be a good idea to 
add a warning to the help for lapply(), to the effect that if FUN has an 
optional argument named "X" then passing this argument via "..." will 
cause this argument to be taken as the first argument to lapply() and 
thereby induce an error?


cheers,

Rolf Turner

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel