Thanks to all for their suggestions. I apologize for not supplying a self-contained example, I should not post questions when I'm on the way out the door.
Martin's suggestion should work, but I need to put in on our high-performance system next week. On my local 64-bit Linux box with 4GB of RAM it blew up when a vector reached 2.6GB. I may also get something to work using Charles' suggestion to use R's intrinsic table functions. I initially could not see how to do this with a vector of 3 elements, but I believe I can if I sort each vector, to obviate effects of order, and paste them together to make one unique string. Once I get something that works and is an optimized as I can make it, I'll post for future reference and for suggestions on further optimization. Mark Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry Indiana University School of Medicine 15032 Hunter Court, Westfield, IN 46074 (317) 490-5129 Work, & Mobile & VoiceMail (317) 204-4202 Home (no voice mail please) mwkimpel<at>gmail<dot>com ****************************************************************** Charles C. Berry wrote: > On Thu, 13 Mar 2008, Mark W Kimpel wrote: > >> I have a list (length 750), each element containing a vector of unique >> strings (unique gene ids), with length up to ~40 (median 15). I want to >> compile a matrix of all possible triplets and their frequency within >> gene elements. Using combn and a lot of looping, I am accomplishing this >> but it is VERY slow. >> >> I've tried to figure out a way to vectorize this, using "match" and >> "%in%", but can't get my mind around it. >> >> Below is my code. sig.tf.pairs is the list. Suggestions? > > First, be sure that your code does what you really intend for it to do. > > Does this really do what you wanted? > > if (length(intersect(triplets[,m], all.triplets[,k] == M))){ > > If so, then why does the first line below never produce an error? > > count.vec <- count.vec[,-redundant.vec] > > is.null(dim(count.vec)) ## TRUE > > You are basically tabulating. Use the functions that are built for that. > > It looks like what you want is along these lines: > > tab.combns <- function(x) apply( combn( sort(x), M ),2, > function(x) paste(x,collapse='')) > > tab.all <- table( unlist( lapply(sig.tf.pairs,tab.combns) ) ) > > Chuck >> >> Mark >> >> >> ############################################################ >> M <- 3 # 3 for triplets, etc. >> ########################################################## >> # count all triplets >> all.triplets <- NULL >> all.count.vec <- NULL >> for (i in 1:length(sig.tf.pairs)){ >> if (length(sig.tf.pairs[[i]] >= M)){ >> triplets <- combn(sig.tf.pairs[[i]], M, simplify = TRUE) >> for (j in 1:ncol(triplets)){ >> o <- order(triplets[,j]) >> triplets[,j] <- triplets[o,j] >> count.vec <- rep(1, ncol(triplets)) >> } >> if (is.null(all.count.vec)){ >> all.count.vec <- count.vec >> all.triplets <- triplets >> } else { >> redundant.vec <- NULL >> for (k in 1:ncol(all.triplets)){ >> for (m in 1:ncol(triplets)){ >> if (length(intersect(triplets[,m], all.triplets[,k] == M))){ >> all.count.vec[k] <- all.count.vec[k] + 1 >> redundant.vec <- c(redundant.vec, m) >> } >> } >> } >> if(!is.null(redundant.vec)){ >> triplets <- triplets[,-redundant.vec] >> count.vec <- count.vec[,-redundant.vec] >> } >> all.triplets <- cbind(all.triplets, triplets) >> all.count.vec <- c(all.count.vec, count.vec) >> } >> } >> } >> ################################### >> >> -- >> >> Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry >> Indiana University School of Medicine >> >> 15032 Hunter Court, Westfield, IN 46074 >> >> (317) 490-5129 Work, & Mobile & VoiceMail >> (317) 204-4202 Home (no voice mail please) >> >> mwkimpel<at>gmail<dot>com >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > Charles C. Berry (858) 534-2098 > Dept of Family/Preventive > Medicine > E mailto:[EMAIL PROTECTED] UC San Diego > http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 > > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.