I agree a note about this sort of change might be good.

I think it wouldn't be too hard to write such a check to detect simple assignments using <- or =. If you also wanted to detect method redefinitions, or redefinitions of functions stored in lists, etc., it would be harder.

There's unexported code in the pkgload package that will get you the list of R files in the correct collation order: pkgload:::find_code . I don't know of such a function exported by some other package, but there might be one. Once you have that list, you could parse each file and look for top level assignments to a name, then look for duplicates in the vector of names.

Here's a little script that finds cases where an R source file makes an assignment to a variable with the same name as one that was used earlier:

# Assume we are in the top level directory of a package.
Rfiles <- pkgload:::find_code()

allnames <- character()
for (f in Rfiles) {
  exprs <- parse(f)
  names <- character(length(exprs))
  for (i in seq_along(exprs)) {
    expr <- exprs[[i]]
    if (is.name(expr[[1]]) &&
        deparse(expr[[1]]) %in% c("<-", "=") &&
        is.name(expr[[2]])) {
      names[i] <- deparse(expr[[2]])
    }
  }
  names <- names[names != ""]
  prev <- length(allnames)
  allnames <- c(allnames, names)
  dups <- which(duplicated(allnames))
  dups <- dups[dups > prev]
  if (any(dups)) {
    cat("Duplicated names in ", basename(f), ":\n")
    cat(paste(allnames[dups], collapse = ", "), "\n")
  }
}

It could be made more fancy to report the locations of both the original and the dup if you feel motivated.

Duncan Murdoch

On 06/02/2024 8:09 p.m., Chris Black wrote:
Hopefully to too much of a tangent: A related problem this check doesn’t catch 
is accidental top-level redefinitions in package code, such as

## a.R:
helper <- function() 1

f <- function() {
helper()
}
# “cool, f() must return 1"

## b.R:
helper <- function(x) 2

g <- function() {
helper()
}
# “cool, g() must return 2"

## Runtime:
# > c(pkg::f(), pkg::g())
# [1] 2 2
# “oh right, only the last definition of helper() is used”
I’ve seen several variants of this issue in code from folks who are new to package development, especially if they're naively refactoring something that started out as an interactively-run analysis. Collaborators who are puzzled by it get my “packages are collections of objects not sequences of expressions, yes that needs to be in your mental model, here’s the link to RWE again” talk, but I would be happy to be able to point them to a check result to go along with it.

I don’t think this is grounds on its own to change a 20-year precedent, but in 
case anyone is collecting wishlist reasons to make the check look harder...

Thanks,
Chris

On Feb 6, 2024, at 3:17 PM, Martin Morgan <mtmorgan....@gmail.com> wrote:

I went looking and found this in codetools, where it's been for 20 years

https://gitlab.com/luke-tierney/codetools/-/blame/master/R/codetools.R?ref_type=heads#L951

I think the call stack in codetools is checkUsagePackage -> checkUsageEnv -> 
checkUsage, and these are similarly established. The call from the tools package 
https://github.com/wch/r-source/blame/95146f0f366a36899e4277a6a722964a51b93603/src/library/tools/R/QC.R#L4585
 is also quite old.

I'm not sure this had been said explicitly, but perhaps the original intent was 
to protect against accidentally redefining a local function. Obviously one 
could do this with a local variable too, though that might less often be an 
error…

toto <- function(mode) {
    tata <- function(a, b) a * b  # intended
    tata <- function(a, b) a / b  # oops
    …
}

Another workaround is to actually name the local functions

toto <- function(mode) {
    tata <- function(a, b) a * b
    titi <- function(u, v, w) (u + v) / w
    if (mode == 1)
        tata
    else
        titi
}

… or to use a switch statement

toto <- function(mode) {
    ## fun <- switch(…) for use of `fun()` in toto
    switch(
        mode,
        tata = function(a, b) a * b,
        titi = function(u, v, w) (u + v) / w,
        stop("unknown `mode = '", mode, "'`")
    )
}

… or similarly to write `fun <- if … else …`, assigning the result of the `if` 
to `fun`. I guess this last formulation points to the fact that a more careful 
analysis of Hervé's original code means that `fun` can only take one value (only 
one branch of the `if` can be taken) so there can only be one version of `fun` in 
any invocation of `toto()`.

Perhaps the local names (and string-valued 'mode') are suggestive of special 
case, so serve as implicit documentation?

Adding `…` to `tata` doesn't seem like a good idea; toto(1)(3, 5, 7) no longer 
signals an error.

There seems to be a lot in common with S3 and S4 methods, where `toto` corresponds to 
the generic, `tata` and `titi` to methods. This 'dispatch' is brought out by using 
`switch()`. There is plenty of opportunity for thinking that you're invoking one 
method but actually you're invoking the other. For instance with dplyr, I like that I 
can tbl |> print(n = 2) so much that I find myself doing this with data.frame df 
|> print(n = 2), which is an error (`n` partially matches `na.print`, and 2 is not 
a valid value); both methods silently ignore the typo print(m = 2).

Martin Morgan

From: R-devel <r-devel-boun...@r-project.org> on behalf of Henrik Bengtsson 
<henrik.bengts...@gmail.com>
Date: Tuesday, February 6, 2024 at 4:34 PM
To: Izmirlian, Grant (NIH/NCI) [E] <izmir...@mail.nih.gov>
Cc: r-devel@r-project.org <r-devel@r-project.org>
Subject: Re: [Rd] [EXTERNAL] Re: NOTE: multiple local function definitions for 
?fun? with different formal arguments
Here's a dummy example that I think illustrates the problem:

toto <- function() {
  if (runif(1) < 0.5)
    function(a) a
  else
    function(a,b) a+b
}

fcn <- toto()
fcn(1,2)
[1] 3
fcn <- toto()
fcn(1,2)
[1] 3
fcn <- toto()
fcn(1,2)
Error in fcn(1, 2) : unused argument (2)

How can you use the returned function, if you get different arguments?

In your example, you cannot use the returned function without knowing
'mode', or by inspecting the returned function.  So, the warning is
there to alert you to a potential bug.  Anecdotally, I'm pretty sure
this R CMD check NOTE has caught at least one such bug in one of
my/our packages.

If you want to keep the current design pattern, one approach could be
to add ... to your function definitions:

toto <- function(mode)
{
     if (mode == 1)
         fun <- function(a, b, ...) a*b
     else
         fun <- function(u, v, w) (u + v) / w
     fun
}

to make sure that toto() returns functions that accept the same
minimal number of arguments.

/Henrik

On Tue, Feb 6, 2024 at 1:15 PM Izmirlian, Grant (NIH/NCI) [E] via
R-devel <r-devel@r-project.org> wrote:

Because functions get called and therefore, the calling sequence matters. It’s 
just protecting you from yourself, but as someone pointed out, there’s a way to 
silence such notes.
G


From: Hervé Pagès <hpages.on.git...@gmail.com>
Sent: Tuesday, February 6, 2024 2:40 PM
To: Izmirlian, Grant (NIH/NCI) [E] <izmir...@mail.nih.gov>; Duncan Murdoch 
<murdoch.dun...@gmail.com>; r-devel@r-project.org
Subject: Re: [EXTERNAL] Re: [Rd] NOTE: multiple local function definitions for 
?fun? with different formal arguments


On 2/6/24 11:19, Izmirlian, Grant (NIH/NCI) [E] wrote:
The note refers to the fact that the function named ‘fun’ appears to be defined 
in two different ways.

Sure I get that. But how is that any different from a variable being defined in 
two different ways like in

    if (mode == 1)
        x <- -8
    else
        x <- 55

This is such a common and perfectly fine pattern. Why would this be considered 
a potential hazard when the variable is a function?

H.

From: Hervé Pagès 
<hpages.on.git...@gmail.com><mailto:hpages.on.git...@gmail.com>
Sent: Tuesday, February 6, 2024 2:17 PM
To: Duncan Murdoch <murdoch.dun...@gmail.com><mailto:murdoch.dun...@gmail.com>; Izmirlian, Grant 
(NIH/NCI) [E] <izmir...@mail.nih.gov><mailto:izmir...@mail.nih.gov>; 
r-devel@r-project.org<mailto:r-devel@r-project.org>
Subject: [EXTERNAL] Re: [Rd] NOTE: multiple local function definitions for 
?fun? with different formal arguments


Thanks. Workarounds are interesting but... what's the point of the NOTE in the 
first place?

H.
On 2/4/24 09:07, Duncan Murdoch wrote:
On 04/02/2024 10:55 a.m., Izmirlian, Grant (NIH/NCI) [E] via R-devel wrote:


Well you can see that yeast is exactly weekday you have.  The way out is to 
just not name the result

I think something happened to your explanation...




toto <- function(mode)
{
     ifelse(mode == 1,
         function(a,b) a*b,
         function(u, v, w) (u + v) / w)
}

It's a bad idea to use ifelse() when you really want if() ... else ... .  In 
this case it works, but it doesn't always.  So the workaround should be


toto <- function(mode)
{
    if(mode == 1)
        function(a,b) a*b
    else
        function(u, v, w) (u + v) / w
}






________________________________
From: Grant Izmirlian <izmirlidr...@gmail.com><mailto:izmirlidr...@gmail.com>
Date: Sun, Feb 4, 2024, 10:44 AM
To: "Izmirlian, Grant (NIH/NCI) [E]" 
<izmir...@mail.nih.gov><mailto:izmir...@mail.nih.gov>
Subject: Fwd: [EXTERNAL] R-devel Digest, Vol 252, Issue 2

Hi,

I just ran into this 'R CMD check' NOTE for the first time:

* checking R code for possible problems ... NOTE
toto: multiple local function definitions for �fun� with different
   formal arguments

The "offending" code is something like this (simplified from the real code):

toto <- function(mode)
{
     if (mode == 1)
         fun <- function(a, b) a*b
     else
         fun <- function(u, v, w) (u + v) / w
     fun
}

Is that NOTE really intended? Hard to see why this code would be
considered "wrong".

I know it's just a NOTE but still...

I agree it's a false positive, but the issue is that you have a function object 
in your function which can't be called unconditionally.  The workaround doesn't 
create such an object.

Recognizing that your function never tries to call fun requires global 
inspection of toto(), and most of the checks are based on local inspection.

Duncan Murdoch

______________________________________________
R-devel@r-project.org<mailto:R-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

--

Hervé Pagès



Bioconductor Core Team

hpages.on.git...@gmail.com<mailto:hpages.on.git...@gmail.com>
CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender and are confident the 
content is safe.


--

Hervé Pagès



Bioconductor Core Team

hpages.on.git...@gmail.com<mailto:hpages.on.git...@gmail.com>
CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender and are confident the 
content is safe.


        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to