Hi Petr, Yes, that's really very helpful.
Petr : Using this interpretation, AB occurs at lines 1,3,4 and not 1,3,5. Is this correct? Vineet : Yes , thats right sorry for the typo. Petr: If some sequence contains several ocurrences of a pattern, for example, the sequence A, B, A, B contains AB twice, then it is counted only once? Vineet : what needs to be done if I would like to count it as many times as it occurred ? remove dont call unique function from "unique(embed(rev(x), lpattern))" ? Rgds, Vineet On Fri, Jul 13, 2012 at 3:36 AM, Petr Savicky <savi...@cs.cas.cz> wrote: > On Thu, Jul 12, 2012 at 03:51:54PM -0500, Vineet Shukla wrote: > > I have independent event sequences for example as follows : > > > > Independent event sequence 1 : A , B , C , D > > Independent event sequence 2 : A, C , B > > Independent event sequence 3 :D, A, B, X,Y, Z > > Independent event sequence 4 :C,A,A,B > > Independent event sequence 5 :B,A,D > > > > I want to able to find that most common sequence patters as > > > > {A, B } = > 3 > > from lines 1,3,5. > > > > Pls note that A,C,B must not be considered because C comes in between > > and line 5 also must not be considered because order of A,B is reversed. > > Hi. > > If i understand correctly, the first sequence contains patterns > > AB, BC, CD. > > Using this interpretation, AB occurs at lines 1,3,4 and not 1,3,5. > Is this correct? > > If some sequence contains several ocurrences of a pattern, for example, > the sequence > > A, B, A, B > > contains AB twice, then it is counted only once? > > If this is correct, then try the following > > # your input list > lst <- list( > c("A", "B", "C", "D"), > c("A", "C", "B"), > c("D", "A", "B", "X", "Y", "Z"), > c("C", "A", "A", "B"), > c("B", "A", "D")) > > # extract unique patterns from a single sequence as rows of a matrix > # lpattern is the length of the patterns > singleSeq <- function(x, lpattern) > { > unique(embed(rev(x), lpattern)) > } > > lst1 <- lapply(lst, singleSeq, lpattern=2) > # combine the matrices to a single matrix > mat <- do.call(rbind, lst1) > # convert the patters to strings > pat <- do.call(paste, c(data.frame(mat), sep="")) > out <- table(pat) > out > > pat > AA AB AC AD BA BC BX CA CB CD DA XY YZ > 1 3 1 1 1 1 1 1 1 1 1 1 1 > > names(out)[which.max(out)] > > [1] "AB" > > Hope this helps. > > Petr Savicky. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.