Hi, It is not a just 79 triplets. As I said, there are 79 codes. I am making triplets out of that 79 codes and matching the triplets in the list.
Please find the dput of the data below. > dput(head(newd,10)) structure(list(uniq_id = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"), hi = c("11, 22, 84, 85, 108, 111", "18, 84, 85, 87, 122, 134", "2, 18, 22", "18, 108, 122, 134, 176", "19, 85, 87, 100, 107", "79, 85, 111", "11, 88, 108", "19, 88, 96", "19, 85, 96", "19, 100, 103")), .Names = c("uniq_id", "hi"), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame")) > I am trying to count the frequency of the triplets in the above data using the below code. # split column into a list myList <- strsplit(newd$hi, split=",") # get all pairwise combinations myCombos <- t(combn(unique(unlist(myList)), 3)) # count the instances where the pair is present myCounts <- sapply(1:nrow(myCombos), FUN=function(i) { sum(sapply(myList, function(j) { sum(!is.na(match(c(myCombos[i,]), j)))})==3)}) #final matrix final <- cbind(matrix(as.integer(myCombos), nrow(myCombos)), myCounts) I hope I made my point clear. Please let me know if I miss anything. Regards, Sri On Wed, Jul 27, 2016 at 11:19 PM, Sarah Goslee <sarah.gos...@gmail.com> wrote: > You said you had 79 triplets and 8000 records. > > When I compared 100 triplets to 10000 records it took 86 seconds. > > So obviously there is something you're not telling us about the format > of your data. > > If you use dput() to provide actual examples, you will get better > results than if we on Rhelp have to guess. Because we tend to guess in > ways that make the most sense after extensive R experience, and that's > probably not what you have. > > Sarah > > On Wed, Jul 27, 2016 at 1:29 PM, sri vathsan <srivib...@gmail.com> wrote: > > Hi, > > > > Thanks for the solution. But I am afraid that after running this code > still > > it takes more time. It has been an hour and still it is executing. I > > understand the delay because each triplet has to compare almost 9000 > > elements. > > > > Regards, > > Sri > > > > On Wed, Jul 27, 2016 at 9:02 PM, Sarah Goslee <sarah.gos...@gmail.com> > > wrote: > >> > >> Hi, > >> > >> It's really a good idea to use dput() or some other reproducible way > >> to provide data. I had to guess as to what your data looked like. > >> > >> It appears that order doesn't matter? > >> > >> Given than, here's one approach: > >> > >> combs <- structure(list(V1 = c(65L, 77L, 55L, 23L, 34L), V2 = c(23L, > 34L, > >> 34L, 77L, 65L), V3 = c(77L, 65L, 23L, 34L, 55L)), .Names = c("V1", > >> "V2", "V3"), class = "data.frame", row.names = c(NA, -5L)) > >> > >> dat <- list( > >> c(77,65,34,23,55), > >> c(65,23,77,65,55,34), > >> c(77,34,65), > >> c(55,78,56), > >> c(98,23,77,65,34)) > >> > >> > >> sapply(seq_len(nrow(combs)), function(i)sum(sapply(dat, > >> function(j)all(combs[i,] %in% j)))) > >> > >> On a dataset of comparable time to yours, it takes me under a minute > and a > >> half. > >> > >> > combs <- combs[rep(1:nrow(combs), length=100), ] > >> > dat <- dat[rep(1:length(dat), length=10000)] > >> > > >> > dim(combs) > >> [1] 100 3 > >> > length(dat) > >> [1] 10000 > >> > > >> > system.time(test <- sapply(seq_len(nrow(combs)), > >> > function(i)sum(sapply(dat, function(j)all(combs[i,] %in% j))))) > >> user system elapsed > >> 86.380 0.006 86.391 > >> > >> > >> > >> > >> On Wed, Jul 27, 2016 at 10:47 AM, sri vathsan <srivib...@gmail.com> > wrote: > >> > Hi, > >> > > >> > Apologizes for the less information. > >> > > >> > Basically, myCombos is a matrix with 3 variables which is a triplet > that > >> > is > >> > a combination of 79 codes. There are around 3lakh combination as such > >> > and > >> > it looks like below. > >> > > >> > V1 V2 V3 > >> > 65 23 77 > >> > 77 34 65 > >> > 55 34 23 > >> > 23 77 34 > >> > 34 65 55 > >> > > >> > Each triplet will compare in a list (mylist) having 8177 elements > which > >> > will looks like below. > >> > > >> > 77,65,34,23,55 > >> > 65,23,77,65,55,34 > >> > 77,34,65 > >> > 55,78,56 > >> > 98,23,77,65,34 > >> > > >> > Now I want to count the no of occurrence of the triplet in the above > >> > list. > >> > I.e., the triplet 65 23 77 is seen 3 times in the list. So my output > >> > looks > >> > like below > >> > > >> > V1 V2 V3 Freq > >> > 65 23 77 3 > >> > 77 34 65 4 > >> > 55 34 23 2 > >> > > >> > I hope, I made it clear this time. > >> > > >> > > >> > On Wed, Jul 27, 2016 at 7:00 PM, Bert Gunter <bgunter.4...@gmail.com> > >> > wrote: > >> > > >> >> Not entirely sure I understand, but match() is already vectorized, so > >> >> you > >> >> should be able to lose the supply(). This would speed things up a > lot. > >> >> Please re-read ?match *carefully* . > >> >> > >> >> Bert > >> >> > >> >> On Jul 27, 2016 6:15 AM, "sri vathsan" <srivib...@gmail.com> wrote: > >> >> > >> >> Hi, > >> >> > >> >> I created list of 3 combination numbers (mycombos, around 3 lakh > >> >> combinations) and counting the occurrence of those combination in > >> >> another > >> >> list. This comparision list (mylist) is having around 8000 records.I > am > >> >> using the following code. > >> >> > >> >> myCounts <- sapply(1:nrow(myCombos), FUN=function(i) { > >> >> sum(sapply(myList, function(j) { > >> >> sum(!is.na(match(c(myCombos[i,]), j)))})==3)}) > >> >> > >> >> The above code takes very long time to execute and is there any other > >> >> effecting method which will reduce the time. > >> >> -- > >> >> > >> >> Regards, > >> >> Srivathsan.K > >> >> > > > > > > > > > -- Regards, Srivathsan.K Phone : 9600165206 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.