Re: [Rd] Time to revisit ifelse ?

Mikael Jagan Fri, 01 Aug 2025 10:30:18 -0700

Andre,

There is a new thread (of length one, sadly), which you should read:


    https://stat.ethz.ch/pipermail/r-devel/2025-July/084113.html

The function of mine that you have been testing was just a fast prototype,
and much work has been done in the mean time.  Can you give the current
proposal (ifelse::ifelse1) a try and let us know if anything stands out?

Mikael

On 2025-08-01 1:13 pm, GILLIBERT, Andre wrote:

Martin Maechler <maech...@stat.math.ethz.ch> wrote:

I don't mind putting together a minimal package with some prototypes, tests,
comparisons, etc.  But perhaps we should aim for consensus on a few issues
beforehand.  (Sorry if these have been discussed to death already elsewhere.
In that case, links to relevant threads would be helpful ...)

      1. Should the type and class attribute of the return value be exactly the
         type and class attribute of c(yes[0L], no[0L]), independent of 'test'?
         Or something else?

      2. What should be the attributes of the return value (other than 'class')?

         base::ifelse keeps attributes(test) if 'test' is atomic, which seems
         like desirable behaviour, though dplyr and data.table seem to think
         otherwise:


In my experience, base::ifelse keeping attributes of 'test' is useful for names.
It may also be useful for dimensions, but for other attributes, it may be a 
dangerous feature.
Otherwise, attributes of c(yes, no) should be mostly preserved in my opinion.

3. Should the new function be stricter and/or more verbose?  E.g., should
         it signal a condition if length(yes) or length(no) is not equal to 1
         nor length(test)?


To be consistent with base R, it should warn if length(yes), length(no) and 
length(test) are not divisors of the longest, otherwise silently repeat the 
three vectors to get the same sizes.
This would work consistently with mathematical operators such as test+yes+no.

In my personal experience, the truncation of 'yes' and 'no' to length(test) if 
the most dangerous feature of ifelse().

      4. Should the most common case, in which neither 'yes' nor 'no' has a
         'class' attribute, be handled in C?  The remaining cases might rely on
        method dispatch and thus require a separate "generic" implementation in
      R.  How much faster/more efficient would the C implementation have to
        be to justify the cost (more maintenance for R-core, more obfuscation
       for the average user)?


If the function is not much slower than today ifelse(), it is not worth 
rewriting in C in my opinion.

Thank you for an implementation!
A few examples of misbehaviors (in my opinion):

ifelse2(c(a=TRUE), factor("a"), factor("b"))

Error in as.character.factor(x) : malformed factor

ifelse2(TRUE, factor(c("a","b")), factor(c("b","a")))

[1] a
Levels: a b

I would expect this one to output
[1] a b
Levels: a b

I tried to develop a function that behaves like mathematical operators (e.g. 
test+yes+no) for length & dimensions coercion rules.
Please, find the function and a few tests below:

ifelse2 <- function (test, yes, no) {
        # forces evaluation of arguments in order
        test
        yes
        no

        if (is.atomic(test)) {
                if (!is.logical(test))
                        storage.mode(test) <- "logical"
        }
        else test <- if (isS4(test)) methods::as(test, "logical") else 
as.logical(test)

        ntest <- length(test)
        nyes <- length(yes)
        nno <- length(no)

        nn <- c(ntest, nyes, nno)
       nans <- max(nn)

        ans <- rep(c(yes[0L], no[0L]), length.out=nans)

        # check dimension consistency for arrays
        has.dim <- FALSE
        if (length(dim(test)) | length(dim(yes)) | length(dim(no))) {
                lparams <- list(test, yes, no)
                ldims <- lapply(lparams, dim)
                ldims <- ldims[!sapply(ldims, is.null)]
                ldimnames <- lapply(lparams, dimnames)
                ldimnames <- ldimnames[!sapply(ldimnames, is.null)]

                rdim <- ldims[[1]]
                rdimnames <- ldimnames[[1]]
                for(d in ldims) {
                        if (!identical(d, rdim)) {
                                stop(gettext("non-conformable arrays"))
                        }
                }
                has.dim <- TRUE
        }

        if (any(nans %% nn)) {
                warning(gettext("longer object length is not a multiple of shorter 
object length"))
        }

        if (ntest != nans) {test <- rep(test, length.out=nans)}
        if (nyes != nans) {yes <- rep(yes, length.out=nans)}
        if (nno != nans) {no <- rep(no, length.out=nans)}

        idx <- which( test)
        ans[idx] <- yes[idx]

        idx <- which(!test)
        ans[idx] <- no[idx]

        if (has.dim) {
                dim(ans) <- rdim
                dimnames(ans) <- rdimnames
        }

        if (!is.null(names(test))) {
                names(ans) <- names(test)
        }

        ans
}


ifelse2(c(alpha=TRUE,beta=TRUE,gamma=FALSE),factor(c("A","B","C","X")),factor(c("A","B","C","D")))
ifelse2(c(TRUE,FALSE), as.Date("2025-04-01"), c("2020-07-05", "2022-07-05"))
ifelse2(c(a=TRUE, b=FALSE,c=TRUE,d=TRUE), list(42), list(40,45))
ifelse2(rbind(alpha=c(a=TRUE, b=FALSE),beta=c(c=TRUE,d=FALSE)), list(1:10), 
list(2:20,3:30))
a=rbind(alpha=c(a=TRUE, b=FALSE),beta=c(TRUE,TRUE))
b=rbind(ALPHA=c(A=TRUE, B=FALSE),BETA=c(C=TRUE,D=TRUE))
c=rbind(ALPHA2=c(A2=TRUE, B2=FALSE),BETA2=c(C2=TRUE,D2=TRUE))
ifelse2(a,b,c)
dimnames(a) <- NULL
ifelse2(a,b,c)
dimnames(b) <- NULL
ifelse2(a,b,c)


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Time to revisit ifelse ?

Reply via email to