Here is an approach that will work as long as there are less than 32 unique characters in the strings. It uses logical operations to make the comparison
>library(bitops) > x <- scan(textConnection('a,b,c,d + a,b,c + b,c + a,b,c,d,e'), sep='\n', what='') Read 4 items > x <- strsplit(x, ',') > x [[1]] [1] "a" "b" "c" "d" [[2]] [1] "a" "b" "c" [[3]] [1] "b" "c" [[4]] [1] "a" "b" "c" "d" "e" > strings <- unique(unlist(x)) # get unique characters > # convert to integers to use 'bit' operations > x.i <- sapply(x, function(.str){ + as.integer(sum(2^(match(.str, strings) - 1))) + }) > # create a vector of masks for testing > mask <- 1:(2^(length(strings)) - 1) > # 'remove' entries with single characters > mask[2^(0:(length(strings) - 1))] <- -1 > #setup the strings corresponding to the mask vector > char.mask <- character(length(mask)) > temp.mask <- mask # we will be processing this vector > for (i in seq_along(mask)){ + j <- which(bitAnd(temp.mask, 1) == 1) + char.mask[j] <- paste(char.mask[j], strings[i]) + temp.mask <- bitShiftR(temp.mask, 1) + } > # now count the matches we have with the masks > count <- sapply(mask, function(.mask){ + # if 'and' result is the same as the mask => those characters are in the string + sum(bitAnd(x.i, .mask) == .mask) + }) > # determine which counts >= 2 > gte.2 <- which(count >= 2) > cbind(char.mask[gte.2], count[gte.2]) [,1] [,2] [1,] " a b" "3" [2,] " a c" "3" [3,] " b c" "4" [4,] " a b c" "3" [5,] " a d" "2" [6,] " b d" "2" [7,] " a b d" "2" [8,] " c d" "2" [9,] " a c d" "2" [10,] " b c d" "2" [11,] " a b c d" "2" > > On 10/18/07, Weiwei Shi <[EMAIL PROTECTED]> wrote: > The rule is search on any combinations whose size is more than 1, like > ab, ac, abc, ... > > The storage can be like b2 > > b2 > [[1]] > [1] "a" "b" "c" "d" > > [[2]] > [1] "a" "b" "c" > > [[3]] > [1] "b" "c" > > [[4]] > [1] "a" "b" "c" "d" "e" > > > On 10/18/07, Nordlund, Dan (DSHS/RDA) <[EMAIL PROTECTED]> wrote: > > > -----Original Message----- > > > From: [EMAIL PROTECTED] > > > [mailto:[EMAIL PROTECTED] On Behalf Of Weiwei Shi > > > Sent: Thursday, October 18, 2007 3:32 PM > > > To: jim holtman > > > Cc: [EMAIL PROTECTED] > > > Subject: Re: [R] alignment algorithm or pattern frequency calculation > > > > > > for example, > > > > > > a,b: 3 which means a and b appear together 3 times in my input list. > > > > > > On 10/18/07, jim holtman <[EMAIL PROTECTED]> wrote: > > > > It would be helpful if you explained what the numbers mean. > > > > > > > > On 10/18/07, Weiwei Shi <[EMAIL PROTECTED]> wrote: > > > > > Hi, > > > > > > > > > > I am looking for an algorithm (better if it is > > > implemented in R) which > > > > > can do the following: > > > > > > > > > > from the following list: > > > > > a,b,c,d > > > > > a,b,c > > > > > b,c > > > > > a,b,c,d,e > > > > > > > > > > to calculate > > > > > a,b,c,d: 2 > > > > > a,b,c: 3 > > > > > a,b: 3 > > > > > a,c: 3 > > > > > b,c: 4 > > > > > b,c,d:2 > > > > > > > > > > here, the order is not important. > > > > > > > > > > Thanks. > > > > > > > > > I guess I have more questions than answers. > > 1. What are the rules for deciding on which patterns to search for? You > > don't list c,d or c,d,e, or single character patterns. > > 2. Are these patterns stored in comma separated strings or as character > > vectors or...? > > > > Dan > > > > Daniel J. Nordlund > > Research and Data Analysis > > Washington State Department of Social and Health Services > > Olympia, WA 98504-5204 > > > > > > > > > > > -- > Weiwei Shi, Ph.D > Research Scientist > GeneGO, Inc. > > "Did you always know?" > "No, I did not. But I believed..." > ---Matrix III > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.