The other problem in this example is setting NA's. replace(x, x == 0, NA)
requires two instances of x making it not very pipe friendly. In dplyr there is na_if to address that problem and base R might have something that addresses this so we don't have to define our own zero2na as the base of R now has pipes. On Tue, Sep 17, 2024 at 12:14 PM Martin Maechler <maech...@stat.math.ethz.ch> wrote: > > >>>>> Gabor Grothendieck > >>>>> on Mon, 16 Sep 2024 11:21:55 -0400 writes: > > > Suppose we have `dat` shown below and we want to find the the `y` value > > corresponding to the last value in `x` equal to the corresponding > component > > of `seek` and we wish to return an output the same length as `seek` > using > > `findInterval` to perform the search. This returns the correct result: > > > dat <- data.frame(x = c(2, 2, 3, 4, 4, 4), > > y = c(37, 12, 19, 30, 6, 15), > > seek = 1:6) > > > zero2na <- function(x) replace(x, x == 0, NA) > > > dat |> > > transform(dat, result = y[ zero2na(findInterval(seek, x)) ] ) |> > > _$result > > ## [1] NA 12 19 15 15 15 > > I'd write that as > > with(dat, y[ zero2na(findInterval(seek, x)) ] ) > > so I can read it with jumping hoops and stand on my head ... > > > Since `findInterval` returns an index it is natural that the next step > be > > to use the index and it is also common that we want a result that is the > > same length as the input. > > I think your example where x and y are of the same length > not typical. > > Not that the design of findInterval(x, vec, ..) is indeed to always return > an index, but there isn't any "nomatch", but rather a > - "left of the leftmost", i.e., an x[i] < vec[1] (as 'vec' must be > sorted increasingly) or > - "right of rightmost" , i.e., an x[i] > vec[length(vec)] > > and these should give *different* results (and not both the > same). > > I don't think 'nomatch' would improve the relatively clean findInterval() > behavior. > > There are three logical switches ... which allow 2^3 > variants of which I now guess only 6 differ: > > Here's some R code showing the possibilities: > > > (argsTF <- names(formals(findInterval))[-(1:2)]) # "rightmost.closed" > "all.inside" "left.open" > FT <- c(FALSE, TRUE) > allFT <- as.matrix(expand.grid(rightmost.closed = FT, > all.inside = FT, > left.open = FT)) > allFT > (cn <- substr(colnames(allFT), 1,1)) # "r" "a" "l" > > x <- 2:18 > v <- c(5, 10, 15) # create two bins [5,10) and [10,15) > > fiAll <- apply(allFT, 1, function(r.a.f) > do.call(findInterval, c(list(x, v), as.list(r.a.f)))) > > cbind(x, fiAll) # has all info > > ## must find cool 'column names' for fiAll: construct from r.., a.., l.. = F > / T > (cn1 <- apply(`dim<-`(c(".","|")[allFT+1L], dim(allFT)), 1, paste0, > collapse="")) > ## "..." "|.." ".|." "||." "..|" "|.|" ".||" "|||" > colnames(fiAll) <- cn1 > cbind(x, fiAll) ## --> col. 3 == 4 and 7 == 8 > ##==> show only unique columns: > cbind(x, t(unique(t(fiAll)))) > ## x ... |.. .|. ..| |.| .|| > ## 2 0 0 1 0 0 1 > ## 3 0 0 1 0 0 1 > ## 4 0 0 1 0 0 1 > ## 5 1 1 1 0 1 1 > ## 6 1 1 1 1 1 1 > ## 7 1 1 1 1 1 1 > ## 8 1 1 1 1 1 1 > ## 9 1 1 1 1 1 1 > ## 10 2 2 2 1 1 1 > ## 11 2 2 2 2 2 2 > ## 12 2 2 2 2 2 2 > ## 13 2 2 2 2 2 2 > ## 14 2 2 2 2 2 2 > ## 15 3 2 2 2 2 2 > ## 16 3 3 2 3 3 2 > ## 17 3 3 2 3 3 2 > ## 18 3 3 2 3 3 2 > > > Martin -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel