On Thu, Jul 12, 2012 at 03:51:54PM -0500, Vineet Shukla wrote: > I have independent event sequences for example as follows : > > Independent event sequence 1 : A , B , C , D > Independent event sequence 2 : A, C , B > Independent event sequence 3 :D, A, B, X,Y, Z > Independent event sequence 4 :C,A,A,B > Independent event sequence 5 :B,A,D > > I want to able to find that most common sequence patters as > > {A, B } = > 3 > from lines 1,3,5. > > Pls note that A,C,B must not be considered because C comes in between > and line 5 also must not be considered because order of A,B is reversed.
Hi. If i understand correctly, the first sequence contains patterns AB, BC, CD. Using this interpretation, AB occurs at lines 1,3,4 and not 1,3,5. Is this correct? If some sequence contains several ocurrences of a pattern, for example, the sequence A, B, A, B contains AB twice, then it is counted only once? If this is correct, then try the following # your input list lst <- list( c("A", "B", "C", "D"), c("A", "C", "B"), c("D", "A", "B", "X", "Y", "Z"), c("C", "A", "A", "B"), c("B", "A", "D")) # extract unique patterns from a single sequence as rows of a matrix # lpattern is the length of the patterns singleSeq <- function(x, lpattern) { unique(embed(rev(x), lpattern)) } lst1 <- lapply(lst, singleSeq, lpattern=2) # combine the matrices to a single matrix mat <- do.call(rbind, lst1) # convert the patters to strings pat <- do.call(paste, c(data.frame(mat), sep="")) out <- table(pat) out pat AA AB AC AD BA BC BX CA CB CD DA XY YZ 1 3 1 1 1 1 1 1 1 1 1 1 1 names(out)[which.max(out)] [1] "AB" Hope this helps. Petr Savicky. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.