How about this: > x <- "A00096:A00096:A00096:A00096:A02178:A02178:A07776" > x.s <- unlist(strsplit(x, ":")) > for (i in 2:length(x.s)){ + x.seq <- embed(length(x.s):1, i) + print(table(apply(x.seq, 1, function(z){ + paste(x.s[z], collapse=":") + }))) + }
A00096:A00096 A00096:A02178 A02178:A02178 A02178:A07776 3 1 1 1 A00096:A00096:A00096 A00096:A00096:A02178 A00096:A02178:A02178 A02178:A02178:A07776 2 1 1 1 A00096:A00096:A00096:A00096 A00096:A00096:A00096:A02178 A00096:A00096:A02178:A02178 A00096:A02178:A02178:A07776 1 1 1 1 A00096:A00096:A00096:A00096:A02178 A00096:A00096:A00096:A02178:A02178 A00096:A00096:A02178:A02178:A07776 1 1 1 A00096:A00096:A00096:A00096:A02178:A02178 A00096:A00096:A00096:A02178:A02178:A07776 1 1 A00096:A00096:A00096:A00096:A02178:A02178:A07776 1 On Fri, Apr 17, 2009 at 9:33 AM, Albert Vilella <avile...@gmail.com> wrote: > Starting by the first entry: > A00096:A00096:A00096:A00096:A02178:A02178:A07776 > > and supposing there aren't any other subvectors identical in the set, the > algorithm will slide through the vector, first in pairs, then in trios, then > in sets of four, etc, and count the occurrences: > > A00096:A00096 > 3 > A00096:A02178 > 1 > A02178:A02178 > 1 > A02178:A07776 > 1 > A00096:A00096:A00096 > 2 > A00096:A00096:A02178 > 1 > A00096:A02178:A02178 > 1 > A02178:A02178:A07776 > 1 > A00096:A00096:A00096:A00096 > 1 > A00096:A00096:A00096:A02178 > 1 > A00096:A00096:A02178:A02178 > 1 > A00096:A02178:A02178:A07776 > 1 > A00096:A00096:A00096:A00096:A02178 > 1 > A00096:A00096:A00096:A02178:A02178 > 1 > A00096:A00096:A02178:A02178:A07776 > 1 > A00096:A00096:A00096:A00096:A02178:A02178 > 1 > A00096:A00096:A00096:A02178:A02178:A07776 > 1 > A00096:A00096:A00096:A00096:A02178:A02178:A07776 > 1 > > > > > On Fri, Apr 17, 2009 at 1:04 PM, jim holtman <jholt...@gmail.com> wrote: >> >> Can you provide the output that you would expect from the data you >> gave. I am not sure what you mean by a 'subvector'. >> >> On Fri, Apr 17, 2009 at 5:25 AM, Albert Vilella <avile...@gmail.com> >> wrote: >> > Hi, >> > >> > I've got a list of ~20000 elements that look like this: >> > >> > [1] >> > "A00096:A00096:A00096:A00096:A02178:A02178:A07776" >> > >> > [2] >> > "A00046:A00076:A01101:A04146:A05671:A07169" >> > >> > [3] >> > >> > "A00038:A00932:A02185:A02370:A02818:A02818:A02818:A02818:A04732:A07142:A07142" >> > >> > [4] >> > "A00096:A01352:A01352:A02023:A05001:A05001:A07776" >> > >> > [5] >> > >> > "A00036:A00047:A00059:A00503:A00904:A00904:A00904:A01023:A01023:A01399:A02029:A03941:A07679" >> > [6] >> > >> > "A00041:A00533:A00855:A02178:A02178:A02178:A05671:A05671:A05671:A05671:A05671:A05671:A05671" >> > ... >> > >> > And I would like to have a table with the frequency of occurrences for >> > matching subvectors in all elements, i.e., not >> > only the number of times a vector is found but also how many times a >> > subvector (of at least 2 ids) is found. >> > >> > How can I do that? >> > Thanks in advance, >> > Albert. >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> >> >> -- >> Jim Holtman >> Cincinnati, OH >> +1 513 646 9390 >> >> What is the problem that you are trying to solve? > > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.