Further refining the vectorized (within a loop) exact string match function, I get times below 0.9 seconds while maintaining error checking. This is accomplished by removing which() and replacing 1:length() with seq_along().
sub2 <- function(pattern, replacement, x) { len <- length(x) y <- character(length=len) patlen <- length(pattern) replen <- length(replacement) if(patlen != replen) stop('Error: Pattern and replacement length do not match') for(i in seq_along(pattern)) { y[x==pattern[i]] <- replacement[i] } return(y) } system.time(for(i in 1:50000) sub2(patt, repl, X)) user system elapsed 0.86 0.00 0.86 Since the ordered vectors are perfectly aligned, might as well do an exact string match. Hence, I think this is not off-topic. Cheers, Adam On Wednesday, July 29, 2015 at 8:15:52 AM UTC-7, Bert Gunter wrote: > > There is confusion here. apply() family functions are **NOT** > vectorization -- they ARE loops (at the interpreter level), just done > in "functionalized" form. Please read background material (John > Chambers's books, MASS, or numerous others) to improve your > understanding and avoid posting erroneous comments. > > Cheers, > Bert > > > Bert Gunter > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." > -- Clifford Stoll > > > On Tue, Jul 28, 2015 at 3:00 PM, John Thaden <jjth...@flash.net > <javascript:>> wrote: > > Adam, The method you propose gives a different result than the prior > methods for these example vectors > > X <- c("ab", "cd", "ef") > > patt <- c("b", "cd", "a") > > repl <- c("B", "CD", "A") > > > > Old method 1 > > > > mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, > x=X) > > gives > > b cd a > > "aB" "CD" "ef" > > > > Old method 2 > > > > sub2 <- function(pattern, replacement, x) { > > len <- length(x) > > if (length(pattern) == 1) > > pattern <- rep(pattern, len) > > if (length(replacement) == 1) > > replacement <- rep(replacement, len) > > FUN <- function(i, ...) { > > sub(pattern[i], replacement[i], x[i], fixed = TRUE) > > } > > idx <- 1:length(x) > > sapply(idx, FUN) > > } > > sub2(patt, repl, X) > > gives > > [1] "aB" "CD" "ef" > > > > Your method (I gave it the unique name "sub3") > > sub3 <- function(pattern, replacement, x) { len <- length(x) y > <- character(length=len) patlen <- length(pattern) replen <- > length(replacement) if(patlen != replen) stop('Error: Pattern and > replacement length do not match') for(i in 1:replen) { > y[which(x==pattern[i])] <- replacement[i] } return(y)}sub3(patt, repl, > X) > > gives[1] "" "CD" "" > > > > Granted, whatever it does, it does it faster > > #Old method 1 > > system.time(for(i in 1:50000) > > mapply(function(p,r,x) sub(p,r,x, fixed = TRUE),p=patt,r=repl,x=X)) > > user system elapsed > > 2.53 0.00 2.52 > > > > #Old method 2 > > system.time(for(i in 1:50000)sub2(patt, repl, X)) user system elapsed > > 2.32 0.00 2.32 > > > > #Your proposed method > > system.time(for(i in 1:50000) sub3(patt, repl, X)) > > user system elapsed > > 1.02 0.00 1.01 > > but would it still be faster if it actually solved the same problem? > > > > -John Thaden > > > > > > > > > > On Monday, July 27, 2015 11:40 PM, Adam Erickson < > adam.micha...@gmail.com <javascript:>> wrote: > > > > I know this is an old thread, but I wrote a simple FOR loop with > vectorized pattern replacement that is much faster than either of those (it > can also accept outputs differing in length from the patterns): > > sub2 <- function(pattern, replacement, x) { len <- length(x) > y <- character(length=len) patlen <- length(pattern) replen <- > length(replacement) if(patlen != replen) stop('Error: Pattern and > replacement length do not match') for(i in 1:replen) { > y[which(x==pattern[i])] <- replacement[i] } return(y) } > > system.time(test <- sub2(patt, repl, XX)) user system elapsed 0 > 0 0 > > Cheers, > > Adam > > On Wednesday, October 8, 2008 at 9:38:01 PM UTC-7, john wrote: > > Hello Christos, > > To my surprise, vectorization actually hurt processing speed!#Example > > X <- c("ab", "cd", "ef") > > patt <- c("b", "cd", "a") > > repl <- c("B", "CD", "A")sub2 <- function(pattern, replacement, x) { > > len <- length(x) > > if (length(pattern) == 1) > > pattern <- rep(pattern, len) > > if (length(replacement) == 1) > > replacement <- rep(replacement, len) > > FUN <- function(i, ...) { > > sub(pattern[i], replacement[i], x[i], fixed = TRUE) > > } > > idx <- 1:length(x) > > sapply(idx, FUN) > > } > > > > system.time( for(i in 1:10000) sub2(patt, repl, X) ) > > user system elapsed > > 1.18 0.07 1.26 system.time( for(i in 1:10000) > mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X) > ) > > user system elapsed > > 1.42 0.05 1.47 > > > > So much for avoiding loops. > > John Thaden======= At 2008-10-07, 14:58:10 Christos wrote: =======>John, > >>Try the following: > >> > >> mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, > x=X) > >> b cd a > >>"aB" "CD" "ef" > >> > >>-Christos>> -----My Original Message----- > >>> R pattern-matching and replacement functions are > >>> vectorized: they can operate on vectors of targets. > >>> However, they can only use one pattern and replacement. > >>> Here is code to apply a different pattern and replacement for > >>> every target. My question: can it be done better? > >>> > >>> sub2 <- function(pattern, replacement, x) { > >>> len <- length(x) > >>> if (length(pattern) == 1) > >>> pattern <- rep(pattern, len) > >>> if (length(replacement) == 1) > >>> replacement <- rep(replacement, len) > >>> FUN <- function(i, ...) { > >>> sub(pattern[i], replacement[i], x[i], fixed = TRUE) > >>> } > >>> idx <- 1:length(x) > >>> sapply(idx, FUN) > >>> } > >>> > >>> #Example > >>> X <- c("ab", "cd", "ef") > >>> patt <- c("b", "cd", "a") > >>> repl <- c("B", "CD", "A") > >>> sub2(patt, repl, X) > >>> > >>> -John______________________________________________ > > r-h...@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > r-h...@r-project.org <javascript:> mailing list -- To UNSUBSCRIBE and > more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > r-h...@r-project.org <javascript:> mailing list -- To UNSUBSCRIBE and > more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.