On Apr 21, 2010, at 11:07 AM, Jeff Brown wrote:

At April 21, 2010 10:16:10 AM EDT mieke posted to Nabble:

Hey there,

I need to count the matches of a sequence seq=c(2,3,4) in a long vector
v=c(4,2,5,8,9,2,3,5,6,1,7,2,3,4,5,....).
With sum(v %in% seq) I only get the sum of sum(v %in% 2), sum(v %in % 3) and
sum(v %in% 4), but that's not what I need :(


This sort of calculation can't be vectorized; you'll have to iterate through the sequence, e.g. with a "for" loop. I don't know if a routine has already
been written.

A vectorized solution:

 vseq <-c(2,3,4)
 v <- c(4,2,5,8,9,2,3,5,6,1,7,2,3,4,5)
 sum( v[1:(length(v) -2)] == vseq[1] &
      v[2:(length(v) -1)] == vseq[2] &
      v[3:(length(v) )] == vseq[3]    )
# [1] 1

And a check on relative speed which was also a concern you expressed:

require(rbenchmark)
require(zoo)
logsum <- function(v,vseq) sum( v[1:(length(v) -2)] == vseq[1] &
      v[2:(length(v) -1)] == vseq[2] &
      v[3:(length(v) )] == vseq[3] )

lseq = length(vseq)
lv = length(v)
sumroll <- function(v,vseq) sum( rollapply(zoo(v), 3, function(x) all(x == vseq)) )

summatches <- function(v,vseq) sum( sapply(1:(lv-lseq +1),function(i)all(v[i:(i+lseq-1)] == vseq)) )


> benchmark(
+    logsum(v, vseq),
+    summatches(v,vseq),
+    sumroll(v,vseq),
+    order=c('replications', 'elapsed'))
test replications elapsed relative user.self sys.self user.child sys.child 1 logsum(v, vseq) 100 0.002 1.0 0.003 0.001 0 0 2 summatches(v, vseq) 100 0.016 8.0 0.016 0.000 0 0 3 sumroll(v, vseq) 100 0.087 43.5 0.087 0.001 0 0

--

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to