Uwe Ligges <lig...@statistik.tu-dortmund.de> writes: > Ken-JP wrote: >> The code below shows what I'm trying to get rid of. >> If there is no way to get rid of the loop, I will try to use >> package( inline >> ). >> I'm just curious as to whether there is a "vector way" of doing this >> algorithm. > > > I don't see a vector way. If speed is an issue, I'd suggest to port
Parsing the code, it seems like a run of 1's starts with a value greater than .75, and continues until a value less than .5. So this... x1 <- x > .75 c(x1[[1]], x1[-1] & !x1[-n]) flags the start of each run, and this y1 <- lapply(split(x > .5, cumsum(c(x1[[1]], x1[-1] & !x1[-n]))), cumprod) splits candidate runs x > .5 into subsets beginning at each start point, and then processes the subset to extend the run as long as the values are greater than .5. My 'vectorized' version, with an imperfect solution to the first run and glossing over the point x==.5 in the original code, is encode.2 <- function(x) { n <- length(x) x0 <- x < .25 y0 <- lapply(split(x <= .5, cumsum(c(x0[[1]], x0[-1] & !x0[-n]))), cumprod) x1 <- x > .75 y1 <- lapply(split(x > .5, cumsum(c(x1[[1]], x1[-1] & !x1[-n]))), cumprod) as.vector((cumsum(abs(x-.5) > .25) != 0) * (-unlist(y0, use.names=FALSE) + unlist(y1, use.names=FALSE))) } this seems to be 7-20x faster than encode.1 encode.1 <- function(x) { n <- length( x ) y <- rep(NA, n) yprev <- 0; for ( i in (1:n)) { if ( x[i]>0.75 ) { y[i] <- 1; } else if ( x[i]<0.25 ) { y[i] <- -1; } else if ( yprev==1 & x[i]<0.5) { y[i] <- 0; } else if ( yprev==-1 & x[i]>0.5) { y[i] <- 0; } else { y[i] <- yprev } yprev <- y[i]; } y } > set.seed(1) > mean(replicate(10, system.time(encode.1(runif(100000)), TRUE)[[3]])) [1] 0.7256 > set.seed(1) > mean(replicate(10, system.time(encode.2(runif(100000)), TRUE)[[3]])) [1] 0.0983 > mean(replicate(10, system.time(encode.1(integer(100000)), TRUE)[[3]])) [1] 0.6288 > mean(replicate(10, system.time(encode.2(integer(100000)), TRUE)[[3]])) [1] 0.0338 and, modulo x==.5, even seems to produce the same result ;) > set.seed(1); res.1 <- replicate(10, encode.1(runif(1000))) > set.seed(1); res.2 <- replicate(10, encode.2(runif(1000))) > identical(res.1, res.2) [1] TRUE Algorithm f7 of https://stat.ethz.ch/pipermail/r-devel/2008-April/049111.html might point to fast (comparable to C) R implementations. Martin > this small part to C, it's very simple and may yield a considerable > performance boost (untested). It can probably be used as a textbook > example for code where porting to C make sense. > > Best, > Uwe Ligges > > > >> # >> ----------------------------------------------------------------------------------------- >> set.seed(1) >> x <- runif(100) >> n <- length( x ) >> y <- rep(NA, n) >> yprev <- 0; >> for ( i in (1:n)) { >> if ( x[i]>0.75 ) { >> y[i] <- 1; >> } else if ( x[i]<0.25 ) { >> y[i] <- -1; >> } else if ( yprev==1 & x[i]<0.5) { >> y[i] <- 0; >> } else if ( yprev==-1 & x[i]>0.5) { >> y[i] <- 0; >> } else { >> y[i] <- yprev >> } >> yprev <- y[i]; >> } >> >>> y >> [1] 0 0 0 1 -1 1 1 1 1 -1 -1 -1 0 0 1 0 0 1 0 1 1 -1 0 >> -1 -1 >> [26] -1 -1 -1 1 0 0 0 0 -1 1 1 1 -1 0 0 1 1 1 1 1 1 >> -1 -1 >> 0 0 >> [51] 0 1 0 -1 -1 -1 -1 0 0 0 1 0 0 0 0 0 0 1 -1 1 0 >> 1 0 0 0 >> [76] 1 1 0 1 1 0 0 0 0 1 -1 0 -1 -1 -1 -1 -1 0 1 1 1 >> 0 0 1 1 >> >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.