I have a list (length 750), each element containing a vector of unique strings (unique gene ids), with length up to ~40 (median 15). I want to compile a matrix of all possible triplets and their frequency within gene elements. Using combn and a lot of looping, I am accomplishing this but it is VERY slow.
I've tried to figure out a way to vectorize this, using "match" and "%in%", but can't get my mind around it. Below is my code. sig.tf.pairs is the list. Suggestions? Mark ############################################################ M <- 3 # 3 for triplets, etc. ########################################################## # count all triplets all.triplets <- NULL all.count.vec <- NULL for (i in 1:length(sig.tf.pairs)){ if (length(sig.tf.pairs[[i]] >= M)){ triplets <- combn(sig.tf.pairs[[i]], M, simplify = TRUE) for (j in 1:ncol(triplets)){ o <- order(triplets[,j]) triplets[,j] <- triplets[o,j] count.vec <- rep(1, ncol(triplets)) } if (is.null(all.count.vec)){ all.count.vec <- count.vec all.triplets <- triplets } else { redundant.vec <- NULL for (k in 1:ncol(all.triplets)){ for (m in 1:ncol(triplets)){ if (length(intersect(triplets[,m], all.triplets[,k] == M))){ all.count.vec[k] <- all.count.vec[k] + 1 redundant.vec <- c(redundant.vec, m) } } } if(!is.null(redundant.vec)){ triplets <- triplets[,-redundant.vec] count.vec <- count.vec[,-redundant.vec] } all.triplets <- cbind(all.triplets, triplets) all.count.vec <- c(all.count.vec, count.vec) } } } ################################### -- Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry Indiana University School of Medicine 15032 Hunter Court, Westfield, IN 46074 (317) 490-5129 Work, & Mobile & VoiceMail (317) 204-4202 Home (no voice mail please) mwkimpel<at>gmail<dot>com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.