Only an extra set of brackets: is.letter <- function(x) grepl("[[:alpha:]]", x) is.number <- function(x) grepl("[[:digit:]]", x)
Without them, the functions are fast, but wrong. > x [1] "a8" "b5" "c10" "d1" "e6" "f2" "g4" "h3" "i7" "j9" "k" "l" [13] "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" [25] "y" "z" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" [37] "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" [49] "23" "24" "25" "26" > is.letter <- function(x) grepl("[:alpha:]", x) > is.letter(x) [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE [13] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [49] FALSE FALSE FALSE FALSE > is.letter <- function(x) grepl("[[:alpha:]]", x) > is.letter(x) [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [13] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [25] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [49] FALSE FALSE FALSE FALSE ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352 > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- > project.org] On Behalf Of Marc Schwartz > Sent: Monday, August 06, 2012 12:07 PM > To: Rui Barradas > Cc: r-help > Subject: Re: [R] test if elements of a character vector contain letters > > Perhaps I am missing something, but why use sapply() when grepl() is > already vectorized? > > is.letter <- function(x) grepl("[:alpha:]", x) > is.number <- function(x) grepl("[:digit:]", x) > > x <- c(letters, 1:26) > > x[1:10] <- paste(x[1:10], sample(1:10, 10), sep='') > > x <- rep(x, 1e3) > > > str(x) > chr [1:52000] "a2" "b10" "c8" "d3" "e6" "f1" "g5" ... > > > system.time(is.letter(x)) > user system elapsed > 0.011 0.000 0.010 > > > system.time(is.number(x)) > user system elapsed > 0.010 0.000 0.011 > > > Regards, > > Marc Schwartz > > On Aug 6, 2012, at 11:51 AM, Rui Barradas <ruipbarra...@sapo.pt> wrote: > > > Hello, > > > > Fun as an exercise in vectorization. 30 times faster. Don't look, > guess. > > > > Gave it up? Ok, here it is. > > > > > > is_letter <- function(x, pattern=c(letters, LETTERS)){ > > sapply(x, function(y){ > > any(sapply(pattern, function(z) grepl(z, y, fixed=T))) > > }) > > } > > # test ascii codes, just one loop. > > has_letter <- function(x){ > > sapply(x, function(y){ > > y <- as.integer(charToRaw(y)) > > any((65 <= y & y <= 90) | (97 <= y & y <= 122)) > > }) > > } > > > > x <- c(letters, 1:26) > > x[1:10] <- paste(x[1:10], sample(1:10, 10), sep='') > > x <- rep(x, 1e3) > > > > t1 <- system.time(is_letter(x)) > > t2 <- system.time(has_letter(x)) > > rbind(t1, t2, t1/t2) > > user.self sys.self elapsed user.child sys.child > > t1 15.69 0 15.74 NA NA > > t2 0.50 0 0.50 NA NA > > 31.38 NaN 31.48 NA NA > > > > > > Em 06-08-2012 17:25, Liviu Andronic escreveu: > >> Dear all > >> I'm pretty sure that I'm approaching the problem in a wrong way. > >> Suppose the following character vector: > >>> (x[1:10] <- paste(x[1:10], sample(1:10, 10), sep='')) > >> [1] "a10" "b7" "c2" "d3" "e6" "f1" "g5" "h8" "i9" "j4" > >>> x > >> [1] "a10" "b7" "c2" "d3" "e6" "f1" "g5" "h8" "i9" "j4" "k" > >> "l" "m" "n" > >> [15] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" > >> "z" "1" "2" > >> [29] "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" > "13" > >> "14" "15" "16" > >> [43] "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" > >> > >> > >> How do you test whether the elements of the vector contain at least > >> one letter (or at least one digit) and obtain a logical vector of > the > >> same dimension? I came up with the following awkward function: > >> is_letter <- function(x, pattern=c(letters, LETTERS)){ > >> sapply(x, function(y){ > >> any(sapply(pattern, function(z) grepl(z, y, fixed=T))) > >> }) > >> } > >> > >>> is_letter(x) > >> a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k > >> l m n o > >> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > >> TRUE TRUE TRUE TRUE > >> p q r s t u v w x y z > >> 1 2 3 4 > >> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > >> FALSE FALSE FALSE FALSE > >> 5 6 7 8 9 10 11 12 13 14 15 > >> 16 17 18 19 > >> FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > >> FALSE FALSE FALSE FALSE > >> 20 21 22 23 24 25 26 > >> FALSE FALSE FALSE FALSE FALSE FALSE FALSE > >>> is_letter(x, 0:9) ##function slightly misnamed > >> a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k > >> l m n o > >> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE > >> FALSE FALSE FALSE FALSE > >> p q r s t u v w x y z > >> 1 2 3 4 > >> FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > >> TRUE TRUE TRUE TRUE > >> 5 6 7 8 9 10 11 12 13 14 15 > >> 16 17 18 19 > >> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > >> TRUE TRUE TRUE TRUE > >> 20 21 22 23 24 25 26 > >> TRUE TRUE TRUE TRUE TRUE TRUE TRUE > >> > >> > >> Is there a nicer way to do this? Regards > >> Liviu > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.