> -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of David Winsemius > Sent: Thursday, October 28, 2010 9:20 AM > To: Michael D > Cc: r-help@r-project.org > Subject: Re: [R] runtime on ising model > > > On Oct 28, 2010, at 11:52 AM, Michael D wrote: > > > Mike, I'm not sure what you mean about removing foo but I > think the > > method > > is sound in diagnosing a program issue and the results speak for > > themselves. > > > > I did invert my if statement at the suggestion of a CS professor > > (who also > > suggested recoding in C, but I'm in an applied math program and > > haven't had > > the time to take programming courses, which i know would be helpful) > > > > Anyway, with the statement as: > > > > if( !(k %in% c(10^4,10^5,10^6,10^7)) ){ > > #do nothing > > } else { > > q <- q+1 > > Out[[q]] <- M > > } > > > > run times were back to around 20 minutes.
Did that one change really make a difference? R does not evaluate anything in the if or else clauses of an if statement before evaluating the condition. > Have you tried replacing all of those 10^x operations with their > integer equivalents, c(10000L, 100000L, 1000000L)? Each time through > the loop you are unnecessarily calling the "^" function 4 times. You > could also omit the last one. 10^7, during testing since M at the > last iteration (k=10^7) would be the final value and you could just > assign the state of M at the end. So we have eliminated 4*10^7 > unnecessary "^" calls and 10^7 unnecessary comparisons. (The CS > professor is perhaps used to having the C compiler do all > thinking of > this sort for him.) %in% is a relatively expensive function. Use == if you can. E.g., compare the following 2 ways of stashing something at times 1e4, 1e5, and 1e6: > system.time({z <- integer() for(k in seq_len(1e6)) if(k %in% set) z[length(z)+1]<-k print(z)}) [1] 10000 100000 1000000 user system elapsed 46.790 0.023 46.844 > system.time({z <- integer() nextCheckPoint <- 10^4 for(k in seq_len(1e6)) if( k == nextCheckPoint ) { nextCheckPoint <- nextCheckPoint * 10 z[length(z)+1]<-k } print(z)}) [1] 10000 100000 1000000 user system elapsed 4.529 0.013 4.545 With such a large number of iterations it pays to remove unneeded function calls in arithmetic expressions. R does not optimize them out - it is up to you to do that. E.g., > system.time(for(i in seq_len(1e6)) sign(pi)*(-1)) user system elapsed 6.802 0.014 6.818 > system.time(for(i in seq_len(1e6)) -sign(pi)) user system elapsed 3.896 0.011 3.911 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > > -- > David > > > So as best I can tell something > > happens in the if statement causing the computer to work > ahead, as the > > professor suggests. I'm no expert on R (and have no desire to try > > looking at > > the R source code (it would only confuse me)) but if anyone > can offer > > guidance on how the if statement works (Does R try to work ahead? > > Under what > > conditions does it try to "work ahead" so I can try to exploit this > > behavior) I would greatly appreciate it. > > If it would require too much knowledge of the computer system to > > understand > > I doubt I would be able to make use of it, but maybe someone else > > could > > benefit. > > > > On Tue, Oct 26, 2010 at 3:24 PM, Mike Marchywka > > <marchy...@hotmail.com>wrote: > > > >> ---------------------------------------- > >>> Date: Tue, 26 Oct 2010 12:53:14 -0400 > >>> From: mike...@gmail.com > >>> To: j...@bitwrit.com.au > >>> CC: r-help@r-project.org > >>> Subject: Re: [R] runtime on ising model > >>> > >>> I have an update on where the issue is coming from. > >>> > >>> I commented out the code for "pos[k+1] <- M[i,j]" and the if > >>> statement > >> for > >>> time = 10^4, 10^5, 10^6, 10^7 and the storage and everything ran > >> fast(er). > >>> Next I added back in the "pos" statements and still > runtimes were > >>> good > >>> (around 20 minutes). > >>> > >>> So I'm left with something is causing problems in: > >> > >> I haven't looked at this since some passing interest in magnetics > >> decades ago, something about 8-tracks and cassettes, but you have > >> to be careful with conclusions like " I removed foo and problem > >> went away therefore problem was foo." Performance issues are often > >> caused by memory, not CPU limitations. Removing anything with a big > >> memory footprint could speed things up. IO can be a real > bottleneck. > >> If you are talking about things on minute timescales, look at task > >> manager and see if you are even CPU limited. Look for page faults > >> or IO etc. If you really need performance and have a task which > >> is relatively simple, don't ignore c++ as a way to generate data > >> points and then import these into R for analysis. > >> > >> In short, just because you are focusing on math it doesn't mean > >> the computer is limited by that. > >> > >> > >>> > >>> ## Store state at time 10^4, 10^5, 10^6, 10^7 > >>> if( k %in% c(10^4,10^5,10^6,10^7) ){ > >>> q <- q+1 > >>> Out[[q]] <- M > >>> } > >>> > >>> Would there be any reason R is executing the statements > inside the > >>> "if" > >>> before getting to the logical check? > >>> Maybe R is written to hope for the best outcome (TRUE) > and will just > >> throw > >>> out its work if the logic comes up FALSE? > >>> I guess I can always break the for loop up into four parts and > >>> store the > >>> state at the end of each, but thats an unsatisfying > solution to me. > >>> > >>> > >>> Jim, I like the suggestion of just pulling one big sample, but > >>> since I > >> can > >>> get the runtimes under 30 minutes just by removing the storage > >>> piece I > >> doubt > >>> I would see any noticeable changes by pulling large > sample vectors. > >>> > >>> Thanks, > >>> Michael > >>> > >>> On Tue, Oct 26, 2010 at 6:22 AM, Jim Lemon wrote: > >>> > >>>> On 10/26/2010 04:50 PM, Michael D wrote: > >>>> > >>>>> So I'm in a stochastic simulations class and I having issues > >>>>> with the > >>>>> amount > >>>>> of time it takes to run the Ising model. > >>>>> > >>>>> I usually don't like to attach the code I'm running, > since it will > >>>>> probably > >>>>> make me look like a fool, but I figure its the best way I can > >>>>> find any > >>>>> bits > >>>>> I can speed up run time. > >>>>> > >>>>> As for the goals of the exercise: > >>>>> I need the state of the system at time=1, 10k, 100k, > 1mill, and > >>>>> 10mill > >>>>> and the percentage of vertices with positive spin at all t > >>>>> > >>>>> Just to be clear, i'm not expecting anyone to tell me how to > >>>>> program > >> this > >>>>> model, cause I know what I have works for this > exercise, but it > >>>>> takes > >> far > >>>>> too long to run and I'd like to speed it up by replacing slow > >> operations > >>>>> wherever possible. > >>>>> > >>>>> Hi Michael, > >>>> One bottleneck is probably the sampling. If it doesn't grab too > >>>> much > >>>> memory, setting up a vector of the samples (maybe a > million at a > >>>> time > >> if 10 > >>>> million is too big - might be able to rewrite your > sample vector > >>>> when > >> you > >>>> store the state) and using k (and an offset if you don't > have one > >>>> big > >>>> vector) to index it will give you some speed. > >>>> > >>>> Jim > >>>> > >>>> > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> ______________________________________________ > >>> R-help@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >> > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.