Hello,
Fun as an exercise in vectorization. 30 times faster. Don't look, guess.
Gave it up? Ok, here it is.
is_letter <- function(x, pattern=c(letters, LETTERS)){
sapply(x, function(y){
any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
})
}
# test ascii codes, just one loop.
has_letter <- function(x){
sapply(x, function(y){
y <- as.integer(charToRaw(y))
any((65 <= y & y <= 90) | (97 <= y & y <= 122))
})
}
x <- c(letters, 1:26)
x[1:10] <- paste(x[1:10], sample(1:10, 10), sep='')
x <- rep(x, 1e3)
t1 <- system.time(is_letter(x))
t2 <- system.time(has_letter(x))
rbind(t1, t2, t1/t2)
user.self sys.self elapsed user.child sys.child
t1 15.69 0 15.74 NA NA
t2 0.50 0 0.50 NA NA
31.38 NaN 31.48 NA NA
Em 06-08-2012 17:25, Liviu Andronic escreveu:
Dear all
I'm pretty sure that I'm approaching the problem in a wrong way.
Suppose the following character vector:
(x[1:10] <- paste(x[1:10], sample(1:10, 10), sep=''))
[1] "a10" "b7" "c2" "d3" "e6" "f1" "g5" "h8" "i9" "j4"
x
[1] "a10" "b7" "c2" "d3" "e6" "f1" "g5" "h8" "i9" "j4" "k"
"l" "m" "n"
[15] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y"
"z" "1" "2"
[29] "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13"
"14" "15" "16"
[43] "17" "18" "19" "20" "21" "22" "23" "24" "25" "26"
How do you test whether the elements of the vector contain at least
one letter (or at least one digit) and obtain a logical vector of the
same dimension? I came up with the following awkward function:
is_letter <- function(x, pattern=c(letters, LETTERS)){
sapply(x, function(y){
any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
})
}
is_letter(x)
a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k
l m n o
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE
p q r s t u v w x y z
1 2 3 4
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
FALSE FALSE FALSE FALSE
5 6 7 8 9 10 11 12 13 14 15
16 17 18 19
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE
20 21 22 23 24 25 26
FALSE FALSE FALSE FALSE FALSE FALSE FALSE
is_letter(x, 0:9) ##function slightly misnamed
a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k
l m n o
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
FALSE FALSE FALSE FALSE
p q r s t u v w x y z
1 2 3 4
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
TRUE TRUE TRUE TRUE
5 6 7 8 9 10 11 12 13 14 15
16 17 18 19
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE
20 21 22 23 24 25 26
TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Is there a nicer way to do this? Regards
Liviu
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.