Re: [R] How to optimize this codes ?

Daren Tan Thu, 04 Dec 2008 17:20:22 -0800

Hi William, 
 
Thanks for your suggestion. I learn alot from it, some concepts are new which 
need some digestion. > Subject: Re: [R] How to optimize this codes ?> Date: 
Thu, 4 Dec 2008 11:49:30 -0800> From: [EMAIL PROTECTED]> To: [EMAIL PROTECTED]> 
CC: [EMAIL PROTECTED]> > > -----Original Message-----> > From: William Dunlap > 
> Sent: Thursday, December 04, 2008 9:59 AM> > To: '[EMAIL PROTECTED]'> > Cc: 
'R help'> > Subject: Re: [R] How to optimize this codes ?> > > > [R] How to 
optimize this codes ?> > Daren Tan daren76 at hotmail.com> > Thu Dec 4 17:02:49 
CET 2008> > > > How to optimize the for-loop to be reasonably fast for > > 
sample.size=100000000 ? > > You may want to change sample.size=1000 to have an 
idea > > what I am achieving. > > > > set.seed(143)> > A <- matrix(sample(0:1, 
sample.size, TRUE), ncol=10, > > dimnames=list(NULL, LETTERS[1:10]))> > > > B 
<- list()> > for(i in 1:10) { > > B[[i]] <- apply(combn(LETTERS[1:10], i), 2, 
function(x) > > { sum(apply(data.frame(A[!
 ,x]), 1, all)) })> > }> > > > Instead computing all(A[row,x]) each row of the 
matrix by looping> > over the rows you could loop over the columns. That is 
generally> > quicker if there are many more rows than columns. S+ and R have> > 
functions called pmin and pmax that do this for min and max, but> > no pany or 
pall functions. In your 0/1 case all is the same as min> > so you can replace> 
> apply(data.frame(A[,x]), 1, all)> > with> > sum(do.call("pmin", 
unname(A[,x,drop=FALSE])))> > after first converting A to a data.frame before 
starting to compute B.> > (The do.call/unname combo is a hack to pass all the 
columns > > of a data.frame> > as separate arguments to a function. If pmin 
took a list > > that wouldn't> > be necessary.)> > > > When you do timings, 
looking at one size of problem often > > isn't helpful,> > as it doesn't tell 
you how the time depends on the size of > > the input. The> > relationship 
often is not linear. E.g., I put your method in > > a function> > cal!
 led computeB0 and my modification of it into computeB1 and > > wrote a


 makeA> > that makes an A matrix with a given sample.size. Here are > > the 
times, it looks> > like 1000 is below the linear range for this example:> > > > 
sample.size time:computeB0 time:computeB1> > 1000 5.36 0.98> > 10000 36.82 
2.49> > 100000 381.34 18.97> > > > and here are the functions used> > > > makeA 
<-> > function(sample.size){> > set.seed(143);> > 
A<-matrix(sample(0:1,size=sample.size,replace=TRUE), > > ncol=10, 
dimnames=list(NULL, LETTERS[1:10]));> > A> > }> > computeB0 <-> > 
function(A){B<-list()> > for(i in 1:10) {> > B[[i]] <- 
apply(combn(LETTERS[1:10], i), 2, function(x) { > > 
sum(apply(data.frame(A[,x]), 1, all)) })> > }> > B> > }> > computeB1 <-> > 
function(A){> > A<-as.data.frame(A)> > B<-list()> > for(i in 1:10) {> > B[[i]] 
<- apply(combn(LETTERS[1:10], i), 2,> > FUN=function(x) { sum(do.call("pmin", > 
> unname(A[,x,drop=FALSE]))) }> > )> > }> > B> > }> > > > You could probably 
same more time by restructuring this as a > > recursive function,> > still ope!
 rating on columns, that traversed the binary tree of > > column 
inclusion/exclusion.> > I just wanted to point out the advantage of looping 
over > > columns instead of looping over> > rows.> > A hastily written 
recursive version is:> > computeB3 <-> function(A) {> B <- 
rep(list(integer(0)), ncol(A))> storage.mode(A) <- "logical"> recurse <- 
function(thisCol = 1, includedCols = rep(NA, ncol(A)),> allSoFar = rep(TRUE, 
nrow(A))) {> if (thisCol < ncol(A)) # skip this column> Recall(thisCol+1, 
replace(includedCols, thisCol, FALSE),> allSoFar = allSoFar)> else if (thisCol 
> ncol(A))> return()> includedCols[thisCol] <- TRUE> allSoFar <- allSoFar & 
A[,thisCol]> nIncludedCols <- sum(includedCols[1:thisCol])> # cat(thisCol, ":", 
nIncludedCols, ":", includedCols[1:thisCol],> sum(allSoFar), "\n")> # Note B<<- 
in next line uses lexical scoping to change> computeB3:B. Does not work in S+.> 
B[[nIncludedCols]][length(B[[nIncludedCols]])+1] <<- sum(allSoFar)> 
Recall(thisCol+1, includedCols,!
  allSoFar = allSoFar)> }> recurse()> B> }> > The elements of B[[i]], f

or each i, are in a different order than they> are in> the other functions, but 
the histograms are the same.> > I added its time to the above table:> > 
sample.size time:computeB0 time:computeB1 time:computeB3> 1000 5.36 0.98 0.12 > 
10000 36.82 2.49 0.25> 100000 381.34 18.97 1.62> 1000000 ? 290.21 16.36> > Bill 
Dunlap> TIBCO Software Inc - Spotfire Division> wdunlap tibco.com 
_________________________________________________________________


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to optimize this codes ?

Reply via email to