John,
The text below is cut out of a "how to write a package" course I gave at the
R
conference in Vanderbilt. I need to find a home for the course notes, because
it had a
lot of tidbits that are not well explained in the R documentation.
Terry T.
----
Model frames:
One of the first tasks of any modeling routine is to construct a special data
frame
containing the covariates that will be used, via a call to the model.frame
function. The
code to do this is found in many routines, and can be a little opaque on first
view. The
obvious code would be
\begin{verbatim}
coxph <- function(formula, data, weights, subset, na.action,
init, control, ties= c("efron", "breslow", "exact"),
singular.ok =TRUE, robust=FALSE,
model=FALSE, x=FALSE, y=TRUE, tt, method=ties, ...) {
mf <- model.frame(formula, data, subset, weights, na.action)
\end{verbatim}
since those are the coxph arguments that are passed forward to the model.frame
routine.
However, this simple approach will fail with a ``not found'' error message if
any of the
data, subset, weights, etc. arguments are missing. Programs have to take the
slightly more
complicated approach of constructing a call.
\begin{verbatim}
Call <- match.call()
indx <- match(c("formula", "data", "weights", "subset", "na.action"),
names(Call), nomatch=0)
if (indx[1] ==0) stop("A formula argument is required")
temp <- Call[c(1,indx)] # only keep the arguments we wanted
temp[[1]] <- as.name('model.frame') # change the function called
mf <- eval(temp, parent.frame())
Y <- model.response(mf)
etc.
\end{verbatim}
We start with a copy of the call to the program, which we want to save anyway
as
documentation in the output object. Then subscripting is used to extract only
the portions
of the call that we want, saving the result in a temporary. This is based on
the fact that
a call object can be viewed as a list whose first element is the name of the
function to
call, followed by the arguments to the call. Note the use of \code{nomatch=0};
if any
arguments on the list are missing they will then be missing in \code{temp},
without
generating an error message. The \mycode{temp} variable will contain a object
of type
``call'', which is an unevaluated call to a routine. Finally, the name of the
function to
be called is changed from ``coxph'' to ``model.frame'' and the call is
evaluated. In many
of the core routines the result is stored in a variable ``m''. This is a
horribly short
and non-descriptive name. (The above used mf which isn't a much better.) Many
routines
also use ``m'' for the temporary variable leading to \code{m <- eval(m,
parent.frame())},
but I think that is unnecessarily confusing.
The list of names in the match call will include all arguments that should be
evaluated
within context of the named dataframe. This can include more than the list
above, the
survfit routine for instance has an optional argument ``id'' that names an
identifying
variable (several rows of the data may represent a single subject), and this is
included
along with ``formula'' etc in the list of choices in the match function. The
order of
names in the list makes no difference. The id is later retrieved with
\code{model.extract(m, 'id')}, which will be NULL if the argument was not
supplied. At the
time that coxph was written I had not caught on to this fact and thought that
all
variables that came from a data frame had to be represented in the formula
somehow, thus
the use of \code{cluster(id)} as part of the formula, in order to denote a
grouping variable.
On 5/11/19 5:00 AM, [email protected] wrote:
> A number of people have helped me in my mission to understand how lm (and
> other fucntions) are able to pass a dataframe and then refer to a specific
> column in the dataframe. I thank everyone who has responded. I now know a bit
> about deparse(substitute(xx)), but I still don't fully understand how it
> works. The program below attempts to print a column of a dataframe from a
> function whose parameters include the dataframe (df) and the column requested
> (col). The program works fine until the last print statement were I receive
> an error, Error in `[.data.frame`(df, , col) : object 'y' not found . I hope
> someone can explain to me (1) why my code does not work, and (2) what I can
> do to fix it.
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.