Colleagues,

My interest is not in writing ad hoc functions (which I might use once to 
analyze my data), but rather what I will call a system function that might be 
part of a package. The lm function is a paradigm of what I call a system 
function. 

The lm function begins by processing the arguments passed to the function 
(represented in the function as parameters, see code below.) Much of this 
processing is only peripherally related to running a regression, but the code 
is necessary to determine exactly what the user of the system function wants 
the function to do. It would be helpful if there was a document that would 
describe best practices when writing system functions, with clear explanations 
of what each step in system function is designed to do and how the line 
accomplishes its task. It would also be nice if the system function had 
documentation. I have pushed my way through the lm function, and with the help 
of R help files, I have come to understand how the function works, but this is 
not an efficient way to learn best practices that should be used when writing a 
system function. 

Perhaps there is a document that does what I would like to see done, but I do 
not know of one.  

John

lmlm
function (formula, data, subset, weights, na.action, method = "qr",
    model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE,
    contrasts = NULL, offset, ...)
{
    ret.x <- x
    ret.y <- y
    cl <- match.call()
    mf <- match.call(expand.dots = FALSE)
    m <- match(c("formula", "data", "subset", "weights", "na.action",
        "offset"), names(mf), 0L)
    mf <- mf[c(1L, m)]
    mf$drop.unused.levels <- TRUE
    mf[[1L]] <- quote(stats::model.frame)
    mf <- eval(mf, parent.frame())
    if (method == "model.frame")
        return(mf)
    else if (method != "qr")
        warning(gettextf("method = '%s' is not supported. Using 'qr'",
            method), domain = NA)
    mt <- attr(mf, "terms")
    y <- model.response(mf, "numeric")
    w <- as.vector(model.weights(mf))
    if (!is.null(w) && !is.numeric(w))
        stop("'weights' must be a numeric vector")
    offset <- model.offset(mf)
    mlm <- is.matrix(y)
    ny <- if (mlm)
        nrow(y)
    else length(y)
    if (!is.null(offset)) {
        if (!mlm)
            offset <- as.vector(offset)
        if (NROW(offset) != ny)
            stop(gettextf("number of offsets is %d, should equal %d (number of 
observations)",
                NROW(offset), ny), domain = NA)
    }
    if (is.empty.model(mt)) {
        x <- NULL
        z <- list(coefficients = if (mlm) matrix(NA_real_, 0,
            ncol(y)) else numeric(), residuals = y, fitted.values = 0 *
            y, weights = w, rank = 0L, df.residual = if (!is.null(w)) sum(w !=
            0) else ny)
        if (!is.null(offset)) {
            z$fitted.values <- offset
            z$residuals <- y - offset
        }
    }
    else {
        x <- model.matrix(mt, mf, contrasts)
        z <- if (is.null(w))
            lm.fit(x, y, offset = offset, singular.ok = singular.ok,
                ...)
        else lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok,
            ...)
    }
    class(z) <- c(if (mlm) "mlm", "lm")
    z$na.action <- attr(mf, "na.action")
    z$offset <- offset
    z$contrasts <- attr(x, "contrasts")
    z$xlevels <- .getXlevels(mt, mf)
    z$call <- cl
    z$terms <- mt
    if (model)
        z$model <- mf
    if (ret.x)
        z$x <- x
    if (ret.y)
        z$y <- y
    if (!qr)
        z$qr <- NULL
    z
}



John David Sorkin M.D., Ph.D.
Professor of Medicine, University of Maryland School of Medicine;
Associate Director for Biostatistics and Informatics, Baltimore VA Medical 
Center Geriatrics Research, Education, and Clinical Center; 
PI Biostatistics and Informatics Core, University of Maryland School of 
Medicine Claude D. Pepper Older Americans Independence Center;
Senior Statistician University of Maryland Center for Vascular Research;

Division of Gerontology and Paliative Care,
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
Cell phone 443-418-5382




________________________________________
From: Jorgen Harmse <jhar...@roku.com>
Sent: Tuesday, January 7, 2025 1:47 PM
To: r-help@r-project.org; ikwsi...@gmail.com; Bert Gunter; Sorkin, John; 
jdnew...@dcn.davis.ca.us
Subject: Re: Extracting specific arguments from "..."

Interesting discussion. A few things occurred to me.

Apologies to Iris Simmons: I mixed up his answer with Bert's question.

Bert raises questions about promises, and I think they are related to John 
Sorkin's question. A big difference between R and most other languages is that 
function arguments are computed lazily. match.call & substitute tell us what 
expressions will be evaluated if function arguments are needed but not the 
environments in which that will happen. The usual suspects are environment() 
and parent.frame(), but parent.frame(k) & maybe even other environments are 
possible. If you are really determined then I guess you can keep evaluating 
match.call() in parent frames until you have accounted for all the inputs.

It's not clear to what extent John Sorkin is concerned about writing functions 
as opposed to using functions. Lazy computation has advantages but leads to 
some issues.
Exactly matching the function's default expression for an input is not 
necessarily the same as omitting the input. The evaluation environment is 
different.
If the caller uses an expression with side effects then there is no guarantee 
that the side effects will happen. If there are side effects from two or more 
inputs then the order is uncertain. (If an argument is not supplied and the 
default has side effects then they might not happen either. However, I don't 
know why the function writer would specify any side effect except stop(), and 
then he or she has probably arranged for it to happen exactly when it should.)
If a default value depends on another input and that input is modified inside 
the function then order of evaluation of inputs becomes important. Even if you 
know exactly what you're doing when you write the function, you should make it 
clear to future maintainers. An explicit call to force clarifies that the input 
needs to be computed with the existing values of anything that is used in the 
default, even if the code is refactored so that the value is not used 
immediately. If you really want to modify another input before evaluating the 
default then specify that in a comment.

Jeff Newmiller makes a good point. You can still change your mind about 
inspecting a particular input without breaking old code that uses your 
function, and you don’t necessarily need default values.

Old definition: f <- function(…) {<code that passes … to other functions and 
does some other things>}

New definition:
f <- function(…, a = <default expression, possibly stop()>)
{ <pass …, a=a to another function>
  <do something with the output>
}

OR

f <- function(…, a)
{ if (missing(a)) # OK, this becomes clunky if there are several such inputs
  { < pass … to another function >}
  else
 { <inspect or modify a> # Pitfall: Changing the order of evaluation may break 
old code, but then the design was probably too devious in the first place.
    <pass …, a=a to another function>
  }
  <do something with the output>
}

Regards,
Jorgen Harmse.




______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to