Hello,

Function Biostrings::matchPattern can be called with an algorithm = "boyer-moore" argument.
I've never used it, this is the return value of

library(sos)
r1 <- findFn('boyer')
r2 <- findFn('moore')
r1 & r2

I have implemented the Boyer-Moore algorithm a couple of times, the first(!) of all in 8086 assembly, but I'm seeing a difficulty regarding your original request. A Boyer-Moore algorithm to search for subsequences of character vectors all of which such that nchar(x) is 1 should be very easy to implement using the .Call interface, but for integer vectors I am not seeing how to implement the bad character shift table. What would be the alphabet? The set of 32-bit integers? In this case the table length would be prohibitive...

Ideas anyone?

Rui Barradas

Em 28-08-2012 22:05, Duncan Murdoch escreveu:
Is there a function to efficiently search for a subsequence within a vector?

For example, with

x <- 1:100

I'd like to search for the sequence c(49,50,51), and be told that it occurs exactly once, starting at location 49. (The items in the vectors might be numeric or character, and there might be repetitions within the search pattern or within the vector I'm searching.)

Duncan Murdoch

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to