Ashley Pittman wrote:
On Mon, 2009-05-11 at 15:09 -0400, Gus Correa wrote:
Mark Hahn wrote:
I haven't checked the Top500 list in detail,
but I think you are right about 80% being fairly high.
(For big clusters perhaps?).
Other way around, maintaining a high efficiency rating at large node
counts is a very difficult problem so larger clusters tend to have
smaller values.
Hi Ashley, list
I may have phrased it poorly.
I meant exactly what you said, i.e., that it is more difficult
to keep high efficiency in a large installation (particularly w.r.t.
network latency, I would guess) than in a small single-switch cluster.
"Small is better", or easier, perhaps. :)
In the original email I mentioned that Roadrunner (top500 1st),
has Rmax/Rpeak ~= 76%.
However, without any particular expertise or too much effort,
I got 83.4% Rmax here. :)
I was happy with that number,
until somebody in the OpenMPI list told me that
"anything below 85%" needs improvement. :(
At 24 nodes that's probably a reasonable statement.
> Ashley,
Thank you.
It is an encouragement to seek better performance.
However, considering other postings that emphasized the importance
of memory size (and problem size N) for HPL performance,
I wonder if there is still room for significant improvement
in the context of my 16GB/node (out of possible maximum of 128GB/node,
which we don't plan to buy, of course).
With the current memory I have, I can't make the problem much bigger
than the N=196,000 that I've been using.
(This is keeping the "use 80% of your memory" HPL rule of thumb.)
Maybe N can grow a bit more, but not a lot,
as I am close to trigger memory swapping already.
So far varying the HPL parameters, using processor affinity, etc,
haven't shown significant improvement.
The NB, P, Q, sweet spots are clear.
I have not tried other compilers, though, only Gnu, with optimization
flags appropriate for Opteron Shanghai.
I wonder if I reached the HPL "saturation point" for this memory size.
I wonder also if the 24 nodes had full 128GB/node RAM,
which would give me a max problem size N=554,000
(and a really long walltime to run HPL!),
there would be a significant increase in performance.
What do you think?
Has anybody done HPL benchmarks done with nodes "full of memory"? :)
Thank you,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf