On Aug 6, 2012, at 12:06 PM, Marc Schwartz <marc_schwa...@me.com> wrote:
> Perhaps I am missing something, but why use sapply() when grepl() is already > vectorized? > > is.letter <- function(x) grepl("[:alpha:]", x) > is.number <- function(x) grepl("[:digit:]", x) Sorry, typos in the above from my C&P. Should be: is.letter <- function(x) grepl("[[:alpha:]]", x) is.number <- function(x) grepl("[[:digit:]]", x) Marc > > x <- c(letters, 1:26) > > x[1:10] <- paste(x[1:10], sample(1:10, 10), sep='') > > x <- rep(x, 1e3) > >> str(x) > chr [1:52000] "a2" "b10" "c8" "d3" "e6" "f1" "g5" ... > >> system.time(is.letter(x)) > user system elapsed > 0.011 0.000 0.010 > >> system.time(is.number(x)) > user system elapsed > 0.010 0.000 0.011 > > > Regards, > > Marc Schwartz > > On Aug 6, 2012, at 11:51 AM, Rui Barradas <ruipbarra...@sapo.pt> wrote: > >> Hello, >> >> Fun as an exercise in vectorization. 30 times faster. Don't look, guess. >> >> Gave it up? Ok, here it is. >> >> >> is_letter <- function(x, pattern=c(letters, LETTERS)){ >> sapply(x, function(y){ >> any(sapply(pattern, function(z) grepl(z, y, fixed=T))) >> }) >> } >> # test ascii codes, just one loop. >> has_letter <- function(x){ >> sapply(x, function(y){ >> y <- as.integer(charToRaw(y)) >> any((65 <= y & y <= 90) | (97 <= y & y <= 122)) >> }) >> } >> >> x <- c(letters, 1:26) >> x[1:10] <- paste(x[1:10], sample(1:10, 10), sep='') >> x <- rep(x, 1e3) >> >> t1 <- system.time(is_letter(x)) >> t2 <- system.time(has_letter(x)) >> rbind(t1, t2, t1/t2) >> user.self sys.self elapsed user.child sys.child >> t1 15.69 0 15.74 NA NA >> t2 0.50 0 0.50 NA NA >> 31.38 NaN 31.48 NA NA >> >> >> Em 06-08-2012 17:25, Liviu Andronic escreveu: >>> Dear all >>> I'm pretty sure that I'm approaching the problem in a wrong way. >>> Suppose the following character vector: >>>> (x[1:10] <- paste(x[1:10], sample(1:10, 10), sep='')) >>> [1] "a10" "b7" "c2" "d3" "e6" "f1" "g5" "h8" "i9" "j4" >>>> x >>> [1] "a10" "b7" "c2" "d3" "e6" "f1" "g5" "h8" "i9" "j4" "k" >>> "l" "m" "n" >>> [15] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" >>> "z" "1" "2" >>> [29] "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" >>> "14" "15" "16" >>> [43] "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" >>> >>> >>> How do you test whether the elements of the vector contain at least >>> one letter (or at least one digit) and obtain a logical vector of the >>> same dimension? I came up with the following awkward function: >>> is_letter <- function(x, pattern=c(letters, LETTERS)){ >>> sapply(x, function(y){ >>> any(sapply(pattern, function(z) grepl(z, y, fixed=T))) >>> }) >>> } >>> >>>> is_letter(x) >>> a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k >>> l m n o >>> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE >>> TRUE TRUE TRUE TRUE >>> p q r s t u v w x y z >>> 1 2 3 4 >>> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE >>> FALSE FALSE FALSE FALSE >>> 5 6 7 8 9 10 11 12 13 14 15 >>> 16 17 18 19 >>> FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE >>> FALSE FALSE FALSE FALSE >>> 20 21 22 23 24 25 26 >>> FALSE FALSE FALSE FALSE FALSE FALSE FALSE >>>> is_letter(x, 0:9) ##function slightly misnamed >>> a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k >>> l m n o >>> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE >>> FALSE FALSE FALSE FALSE >>> p q r s t u v w x y z >>> 1 2 3 4 >>> FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE >>> TRUE TRUE TRUE TRUE >>> 5 6 7 8 9 10 11 12 13 14 15 >>> 16 17 18 19 >>> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE >>> TRUE TRUE TRUE TRUE >>> 20 21 22 23 24 25 26 >>> TRUE TRUE TRUE TRUE TRUE TRUE TRUE >>> >>> >>> Is there a nicer way to do this? Regards >>> Liviu > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.