Brian, > Yes, if we're talking about developing entirely new methods. But there's a > /ton/ of low-hanging fruit that exists in the mix of systems tuning, compiler > options, /basic/ code changes (not anything deep or time-consuming), etc., > that takes hours or, at most, a few days, and can have massive impacts. The > serial I/O -> parallel I/O example way (way, way) above being one example, > and as another, I can't tell you the number of times I've seen people running > production runs with '-O0 -g' as their compilation flags. Or, not using > tuned BLAS / LAPACK libraries. Or running an OpenMP-enabled code with 1 > process per node but having OMP_NUM_THREADS set to 1 instead of 8. Or > countless other things.
Hours would be optimistic, unless you exclude the initial profiling and other analysis. In days one can often find significant gains, at least in single-threaded performance. As a general rule, I allow one week per application for single-threaded optimization, and two weeks for parallel optimization. Of course, the more time one spends, the more one finds. I am invariably limited by deadlines and/or budgets; but I have never run out of ideas, even after months! But I am usually given applications for which some attempts at optimization have been made, so I take your point about those really dumb mistakes. They bring me joy. :) (By the way, use of '-O0' can have another unintended consequence: because '-O2' is often the most tested level, there may be more errors at '-O0', so it is NOT the safest optimization setting.) Benchmarkers do not need to be on staff; so modulate their cost by project term, and reevaluate the cost equation. Cheers, Max _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf