On Apr 21, 2010, at 11:07 AM, Jeff Brown wrote:
At April 21, 2010 10:16:10 AM EDT mieke posted to Nabble:
Hey there,
I need to count the matches of a sequence seq=c(2,3,4) in a long
vector
v=c(4,2,5,8,9,2,3,5,6,1,7,2,3,4,5,....).
With sum(v %in% seq) I only get the sum of sum(v %in% 2), sum(v %in
% 3) and
sum(v %in% 4), but that's not what I need :(
This sort of calculation can't be vectorized; you'll have to iterate
through
the sequence, e.g. with a "for" loop. I don't know if a routine has
already
been written.
A vectorized solution:
vseq <-c(2,3,4)
v <- c(4,2,5,8,9,2,3,5,6,1,7,2,3,4,5)
sum( v[1:(length(v) -2)] == vseq[1] &
v[2:(length(v) -1)] == vseq[2] &
v[3:(length(v) )] == vseq[3] )
# [1] 1
And a check on relative speed which was also a concern you expressed:
require(rbenchmark)
require(zoo)
logsum <- function(v,vseq) sum( v[1:(length(v) -2)] == vseq[1] &
v[2:(length(v) -1)] == vseq[2] &
v[3:(length(v) )] == vseq[3] )
lseq = length(vseq)
lv = length(v)
sumroll <- function(v,vseq) sum( rollapply(zoo(v), 3, function(x)
all(x == vseq)) )
summatches <- function(v,vseq) sum( sapply(1:(lv-lseq
+1),function(i)all(v[i:(i+lseq-1)] == vseq)) )
> benchmark(
+ logsum(v, vseq),
+ summatches(v,vseq),
+ sumroll(v,vseq),
+ order=c('replications', 'elapsed'))
test replications elapsed relative user.self
sys.self user.child sys.child
1 logsum(v, vseq) 100 0.002 1.0 0.003
0.001 0 0
2 summatches(v, vseq) 100 0.016 8.0 0.016
0.000 0 0
3 sumroll(v, vseq) 100 0.087 43.5 0.087
0.001 0 0
--
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.