Hello,

Fun as an exercise in vectorization. 30 times faster. Don't look, guess.

Gave it up? Ok, here it is.


is_letter <- function(x, pattern=c(letters, LETTERS)){
    sapply(x, function(y){
        any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
    })
}
# test ascii codes, just one loop.
has_letter <- function(x){
    sapply(x, function(y){
        y <- as.integer(charToRaw(y))
        any((65 <= y & y <= 90) | (97 <= y & y <= 122))
    })
}

x <- c(letters, 1:26)
x[1:10] <- paste(x[1:10], sample(1:10, 10), sep='')
x <- rep(x, 1e3)

t1 <- system.time(is_letter(x))
t2 <- system.time(has_letter(x))
rbind(t1, t2, t1/t2)
   user.self sys.self elapsed user.child sys.child
t1     15.69        0   15.74         NA        NA
t2      0.50        0    0.50         NA        NA
       31.38      NaN   31.48         NA        NA


Em 06-08-2012 17:25, Liviu Andronic escreveu:
Dear all
I'm pretty sure that I'm approaching the problem in a wrong way.
Suppose the following character vector:
(x[1:10] <- paste(x[1:10], sample(1:10, 10), sep=''))
  [1] "a10" "b7"  "c2"  "d3"  "e6"  "f1"  "g5"  "h8"  "i9"  "j4"
x
  [1] "a10" "b7"  "c2"  "d3"  "e6"  "f1"  "g5"  "h8"  "i9"  "j4"  "k"
"l"   "m"   "n"
[15] "o"   "p"   "q"   "r"   "s"   "t"   "u"   "v"   "w"   "x"   "y"
"z"   "1"   "2"
[29] "3"   "4"   "5"   "6"   "7"   "8"   "9"   "10"  "11"  "12"  "13"
"14"  "15"  "16"
[43] "17"  "18"  "19"  "20"  "21"  "22"  "23"  "24"  "25"  "26"


How do you test whether the elements of the vector contain at least
one letter (or at least one digit) and obtain a logical vector of the
same dimension? I came up with the following awkward function:
is_letter <- function(x, pattern=c(letters, LETTERS)){
     sapply(x, function(y){
         any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
     })
}

is_letter(x)
   a10    b7    c2    d3    e6    f1    g5    h8    i9    j4     k
l     m     n     o
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
TRUE  TRUE  TRUE  TRUE
     p     q     r     s     t     u     v     w     x     y     z
1     2     3     4
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
FALSE FALSE FALSE FALSE
     5     6     7     8     9    10    11    12    13    14    15
16    17    18    19
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE
    20    21    22    23    24    25    26
FALSE FALSE FALSE FALSE FALSE FALSE FALSE
is_letter(x, 0:9)  ##function slightly misnamed
   a10    b7    c2    d3    e6    f1    g5    h8    i9    j4     k
l     m     n     o
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
FALSE FALSE FALSE FALSE
     p     q     r     s     t     u     v     w     x     y     z
1     2     3     4
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
TRUE  TRUE  TRUE  TRUE
     5     6     7     8     9    10    11    12    13    14    15
16    17    18    19
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
TRUE  TRUE  TRUE  TRUE
    20    21    22    23    24    25    26
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE


Is there a nicer way to do this? Regards
Liviu



______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to