On Tue, Sep 27, 2011 at 5:51 PM, Marcelo Araya <[email protected]> wrote:
> Hi all
>
>
>
> I am analyzing bird song element sequences. I would like to know how can I
> get how many times a given subsequence is found in single string sequence.
>
>
>
>
>
> For example:
>
>
>
> If I have this single sequence:
>
>
>
> ABCABAABABABCAB
>
>
>
> I am looking for the subsequence "ABC". Want I need to get here is that the
> subsequence is found twice.
>
>
>
> Any idea how can I do this?
>
gregexpr will return the position and length of multiple matches. And
you can feed it a vector. So:
> songs=c("ABCABAABABABCAB","ABACAB","ABABCABCBC")
> gregexpr(m,songs)
[[1]]
[1] 1 11
attr(,"match.length")
[1] 3 3
[[2]]
[1] -1
attr(,"match.length")
[1] -1
[[3]]
[1] 3 6
attr(,"match.length")
[1] 3 3
- in the first item, it was found at posn 1 and 11
- in the second it wasnt found at all
- in the third, it was found at posn 3 and 6
so just do some apply-ing to the returned list and get the length of
each element. Job done!
Barry
PS bonus points for spotting the hidden prog-rock song title.
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.