On Wed, Jul 7, 2010 at 12:25 PM, Immanuel <mane.d...@googlemail.com> wrote:
> Hello together,
>
>
> I'm looking for advice on how to do some tests on strings.
> What I want to do is the following:
>
> (just an example, real strings/sequence are about 200-400 characters long)
> given set of Strings:
>
> String1 abcdefgh
> String2 bcdefgop
>
> use a sliding window of size x  to create an vector of all subsequences
> of size x
> found in the set (order matters! ).
>
> Now create, for every string in the set, an vector containing the counts
> on how often
> each subsequence was found in this particular string.
>
>  It would be great if someone could give me a vague outline on how to
> start and which methods to work.
> I did read through the man pages and goggled a lot, but still don't know
> how to
> approach this.
>

Try this:

# generate an input string n long
set.seed(123)
n <- 300
lets <- paste(sample(letters[1:5], n, replace = TRUE), collapse = "")

# get rolling k-length sequences and count
k <- 3
table(substring(lets, 1:(n-k+1), k:n))

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to