>>>>> Gabor Grothendieck >>>>> on Mon, 16 Sep 2024 11:21:55 -0400 writes:
> Suppose we have `dat` shown below and we want to find the the `y` value > corresponding to the last value in `x` equal to the corresponding component > of `seek` and we wish to return an output the same length as `seek` using > `findInterval` to perform the search. This returns the correct result: > dat <- data.frame(x = c(2, 2, 3, 4, 4, 4), > y = c(37, 12, 19, 30, 6, 15), > seek = 1:6) > zero2na <- function(x) replace(x, x == 0, NA) > dat |> > transform(dat, result = y[ zero2na(findInterval(seek, x)) ] ) |> > _$result > ## [1] NA 12 19 15 15 15 I'd write that as with(dat, y[ zero2na(findInterval(seek, x)) ] ) so I can read it with jumping hoops and stand on my head ... > Since `findInterval` returns an index it is natural that the next step be > to use the index and it is also common that we want a result that is the > same length as the input. I think your example where x and y are of the same length not typical. Not that the design of findInterval(x, vec, ..) is indeed to always return an index, but there isn't any "nomatch", but rather a - "left of the leftmost", i.e., an x[i] < vec[1] (as 'vec' must be sorted increasingly) or - "right of rightmost" , i.e., an x[i] > vec[length(vec)] and these should give *different* results (and not both the same). I don't think 'nomatch' would improve the relatively clean findInterval() behavior. There are three logical switches ... which allow 2^3 variants of which I now guess only 6 differ: Here's some R code showing the possibilities: (argsTF <- names(formals(findInterval))[-(1:2)]) # "rightmost.closed" "all.inside" "left.open" FT <- c(FALSE, TRUE) allFT <- as.matrix(expand.grid(rightmost.closed = FT, all.inside = FT, left.open = FT)) allFT (cn <- substr(colnames(allFT), 1,1)) # "r" "a" "l" x <- 2:18 v <- c(5, 10, 15) # create two bins [5,10) and [10,15) fiAll <- apply(allFT, 1, function(r.a.f) do.call(findInterval, c(list(x, v), as.list(r.a.f)))) cbind(x, fiAll) # has all info ## must find cool 'column names' for fiAll: construct from r.., a.., l.. = F / T (cn1 <- apply(`dim<-`(c(".","|")[allFT+1L], dim(allFT)), 1, paste0, collapse="")) ## "..." "|.." ".|." "||." "..|" "|.|" ".||" "|||" colnames(fiAll) <- cn1 cbind(x, fiAll) ## --> col. 3 == 4 and 7 == 8 ##==> show only unique columns: cbind(x, t(unique(t(fiAll)))) ## x ... |.. .|. ..| |.| .|| ## 2 0 0 1 0 0 1 ## 3 0 0 1 0 0 1 ## 4 0 0 1 0 0 1 ## 5 1 1 1 0 1 1 ## 6 1 1 1 1 1 1 ## 7 1 1 1 1 1 1 ## 8 1 1 1 1 1 1 ## 9 1 1 1 1 1 1 ## 10 2 2 2 1 1 1 ## 11 2 2 2 2 2 2 ## 12 2 2 2 2 2 2 ## 13 2 2 2 2 2 2 ## 14 2 2 2 2 2 2 ## 15 3 2 2 2 2 2 ## 16 3 3 2 3 3 2 ## 17 3 3 2 3 3 2 ## 18 3 3 2 3 3 2 Martin ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel