> -----Original Message----- > From: William Dunlap > Sent: Thursday, December 04, 2008 9:59 AM > To: '[EMAIL PROTECTED]' > Cc: 'R help' > Subject: Re: [R] How to optimize this codes ? > > [R] How to optimize this codes ? > Daren Tan daren76 at hotmail.com > Thu Dec 4 17:02:49 CET 2008 > > How to optimize the for-loop to be reasonably fast for > sample.size=100000000 ? > You may want to change sample.size=1000 to have an idea > what I am achieving. > > set.seed(143) > A <- matrix(sample(0:1, sample.size, TRUE), ncol=10, > dimnames=list(NULL, LETTERS[1:10])) > > B <- list() > for(i in 1:10) { > B[[i]] <- apply(combn(LETTERS[1:10], i), 2, function(x) > { sum(apply(data.frame(A[,x]), 1, all)) }) > } > > Instead computing all(A[row,x]) each row of the matrix by looping > over the rows you could loop over the columns. That is generally > quicker if there are many more rows than columns. S+ and R have > functions called pmin and pmax that do this for min and max, but > no pany or pall functions. In your 0/1 case all is the same as min > so you can replace > apply(data.frame(A[,x]), 1, all) > with > sum(do.call("pmin", unname(A[,x,drop=FALSE]))) > after first converting A to a data.frame before starting to compute B. > (The do.call/unname combo is a hack to pass all the columns > of a data.frame > as separate arguments to a function. If pmin took a list > that wouldn't > be necessary.) > > When you do timings, looking at one size of problem often > isn't helpful, > as it doesn't tell you how the time depends on the size of > the input. The > relationship often is not linear. E.g., I put your method in > a function > called computeB0 and my modification of it into computeB1 and > wrote a makeA > that makes an A matrix with a given sample.size. Here are > the times, it looks > like 1000 is below the linear range for this example: > > sample.size time:computeB0 time:computeB1 > 1000 5.36 0.98 > 10000 36.82 2.49 > 100000 381.34 18.97 > > and here are the functions used > > makeA <- > function(sample.size){ > set.seed(143); > A<-matrix(sample(0:1,size=sample.size,replace=TRUE), > ncol=10, dimnames=list(NULL, LETTERS[1:10])); > A > } > computeB0 <- > function(A){B<-list() > for(i in 1:10) { > B[[i]] <- apply(combn(LETTERS[1:10], i), 2, function(x) { > sum(apply(data.frame(A[,x]), 1, all)) }) > } > B > } > computeB1 <- > function(A){ > A<-as.data.frame(A) > B<-list() > for(i in 1:10) { > B[[i]] <- apply(combn(LETTERS[1:10], i), 2, > FUN=function(x) { sum(do.call("pmin", > unname(A[,x,drop=FALSE]))) } > ) > } > B > } > > You could probably same more time by restructuring this as a > recursive function, > still operating on columns, that traversed the binary tree of > column inclusion/exclusion. > I just wanted to point out the advantage of looping over > columns instead of looping over > rows.
A hastily written recursive version is: computeB3 <- function(A) { B <- rep(list(integer(0)), ncol(A)) storage.mode(A) <- "logical" recurse <- function(thisCol = 1, includedCols = rep(NA, ncol(A)), allSoFar = rep(TRUE, nrow(A))) { if (thisCol < ncol(A)) # skip this column Recall(thisCol+1, replace(includedCols, thisCol, FALSE), allSoFar = allSoFar) else if (thisCol > ncol(A)) return() includedCols[thisCol] <- TRUE allSoFar <- allSoFar & A[,thisCol] nIncludedCols <- sum(includedCols[1:thisCol]) # cat(thisCol, ":", nIncludedCols, ":", includedCols[1:thisCol], sum(allSoFar), "\n") # Note B<<- in next line uses lexical scoping to change computeB3:B. Does not work in S+. B[[nIncludedCols]][length(B[[nIncludedCols]])+1] <<- sum(allSoFar) Recall(thisCol+1, includedCols, allSoFar = allSoFar) } recurse() B } The elements of B[[i]], for each i, are in a different order than they are in the other functions, but the histograms are the same. I added its time to the above table: sample.size time:computeB0 time:computeB1 time:computeB3 1000 5.36 0.98 0.12 10000 36.82 2.49 0.25 100000 381.34 18.97 1.62 1000000 ? 290.21 16.36 Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.