On 08/06/2012 09:51 AM, Rui Barradas wrote:
Hello,
Fun as an exercise in vectorization. 30 times faster. Don't look, guess.
> system.time(res0 <- grepl("[[:alpha:]]", x))
user system elapsed
0.060 0.000 0.061
> system.time(res1 <- has_letter(x))
user system elapsed
3.728 0.008 3.747
> all.equal(res0, res1, check.attributes=FALSE)
[1] TRUE
Gave it up? Ok, here it is.
is_letter <- function(x, pattern=c(letters, LETTERS)){
sapply(x, function(y){
any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
})
}
# test ascii codes, just one loop.
has_letter <- function(x){
sapply(x, function(y){
y <- as.integer(charToRaw(y))
any((65 <= y & y <= 90) | (97 <= y & y <= 122))
})
}
x <- c(letters, 1:26)
x[1:10] <- paste(x[1:10], sample(1:10, 10), sep='')
x <- rep(x, 1e3)
t1 <- system.time(is_letter(x))
t2 <- system.time(has_letter(x))
rbind(t1, t2, t1/t2)
user.self sys.self elapsed user.child sys.child
t1 15.69 0 15.74 NA NA
t2 0.50 0 0.50 NA NA
31.38 NaN 31.48 NA NA
Em 06-08-2012 17:25, Liviu Andronic escreveu:
Dear all
I'm pretty sure that I'm approaching the problem in a wrong way.
Suppose the following character vector:
(x[1:10] <- paste(x[1:10], sample(1:10, 10), sep=''))
[1] "a10" "b7" "c2" "d3" "e6" "f1" "g5" "h8" "i9" "j4"
x
[1] "a10" "b7" "c2" "d3" "e6" "f1" "g5" "h8" "i9" "j4" "k"
"l" "m" "n"
[15] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y"
"z" "1" "2"
[29] "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13"
"14" "15" "16"
[43] "17" "18" "19" "20" "21" "22" "23" "24" "25" "26"
How do you test whether the elements of the vector contain at least
one letter (or at least one digit) and obtain a logical vector of the
same dimension? I came up with the following awkward function:
is_letter <- function(x, pattern=c(letters, LETTERS)){
sapply(x, function(y){
any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
})
}
is_letter(x)
a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k
l m n o
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE
p q r s t u v w x y z
1 2 3 4
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
FALSE FALSE FALSE FALSE
5 6 7 8 9 10 11 12 13 14 15
16 17 18 19
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE
20 21 22 23 24 25 26
FALSE FALSE FALSE FALSE FALSE FALSE FALSE
is_letter(x, 0:9) ##function slightly misnamed
a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k
l m n o
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
FALSE FALSE FALSE FALSE
p q r s t u v w x y z
1 2 3 4
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
TRUE TRUE TRUE TRUE
5 6 7 8 9 10 11 12 13 14 15
16 17 18 19
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE
20 21 22 23 24 25 26
TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Is there a nicer way to do this? Regards
Liviu
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.