On Wed, Jul 7, 2010 at 12:25 PM, Immanuel <mane.d...@googlemail.com> wrote: > Hello together, > > > I'm looking for advice on how to do some tests on strings. > What I want to do is the following: > > (just an example, real strings/sequence are about 200-400 characters long) > given set of Strings: > > String1 abcdefgh > String2 bcdefgop > > use a sliding window of size x to create an vector of all subsequences > of size x > found in the set (order matters! ). > > Now create, for every string in the set, an vector containing the counts > on how often > each subsequence was found in this particular string. > > It would be great if someone could give me a vague outline on how to > start and which methods to work. > I did read through the man pages and goggled a lot, but still don't know > how to > approach this. >
Try this: # generate an input string n long set.seed(123) n <- 300 lets <- paste(sample(letters[1:5], n, replace = TRUE), collapse = "") # get rolling k-length sequences and count k <- 3 table(substring(lets, 1:(n-k+1), k:n)) ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.