On 9/22/2008 1:51 PM, Markus Loecher wrote:
Dear R users,
the help for findInterval(x,vec) suggests a logarithmic dependence on N
(=length(vec)), which would imply a binary search type algorithm.
However, when I "test" this hypothesis, in the following manner:

R is open source. Why test things this way, when you can look at the source? You don't even need to go to C code for this:

> findInterval
function (x, vec, rightmost.closed = FALSE, all.inside = FALSE)
{
    if (any(is.na(vec)))
        stop("'vec' contains NAs")
    if (is.unsorted(vec))
        stop("'vec' must be sorted non-decreasingly")
    if (has.na <- any(ix <- is.na(x)))
        x <- x[!ix]
    nx <- length(x)
    index <- integer(nx)
.C("find_interv_vec", xt = as.double(vec), n = as.integer(length(vec)), x = as.double(x), nx = as.integer(nx), as.logical(rightmost.closed),
        as.logical(all.inside), index, DUP = FALSE, NAOK = TRUE,
        PACKAGE = "base")
    if (has.na) {
        ii <- as.integer(ix)
        ii[ix] <- NA
        ii[!ix] <- index
        ii
    }
    else index
}
<environment: namespace:base>

Notice the "is.unsorted" test. How could that be anything other than linear execution time in N? Similarly for any(ix <- is.na(x)).

If you know the answers to those tests (as you do in your simulation), you could presumably get O(log(n)) behaviour by writing a new function that skipped them. But you could take a look at the source code (in https://svn.r-project.org/R/trunk/src/appl/interv.c) if you want to check, or if you notice any weird timings.

Duncan Murdoch



set.seed(-3645);
l <- vector();
N.seq <- c(5000, 500000, 1000000, 10000000, 50000000);k <- 1
for (N in N.seq){
  tmp <- sort(round(stats::rt(N, df=2), 2));
  l[k] <- system.time(it3 <- findInterval(-1, tmp))[2];k <- k + 1;
}
plot(N.seq,l,type="b",xlab="length(vec)", ylab="CPU time");

the resulting plot suggests a linear relationship.
I must be missing sth. here ?

Thanks !

Markus

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to