Re: [Beowulf] evaluating FLOPS capacity of our cluster

Gus Correa Mon, 11 May 2009 15:41:15 -0700

Ashley Pittman wrote:

On Mon, 2009-05-11 at 15:09 -0400, Gus Correa wrote:

Mark Hahn wrote:

I haven't checked the Top500 list in detail,
but I think you are right about 80% being fairly high.
(For big clusters perhaps?).


Other way around, maintaining a high efficiency rating at large node
counts is a very difficult problem so larger clusters tend to have
smaller values.


Hi Ashley, list

I may have phrased it poorly.

I meant exactly what you said, i.e., that it is more difficult
to keep high efficiency in a large installation (particularly w.r.t.
network latency, I would guess) than in a small single-switch cluster.

"Small is better", or easier, perhaps.  :)

In the original email I mentioned that Roadrunner (top500 1st),
has Rmax/Rpeak ~= 76%.

However, without any particular expertise or too much effort,
I got 83.4% Rmax here. :)
I was happy with that number,
until somebody in the OpenMPI list told me that
"anything below 85%" needs improvement.  :(


At 24 nodes that's probably a reasonable statement.

> Ashley,


Thank you.
It is an encouragement to seek better performance.

However, considering other postings that emphasized the importance
of memory size (and problem size N) for HPL performance,
I wonder if there is still room for significant improvement
in the context of my 16GB/node (out of possible maximum of 128GB/node,
which we don't plan to buy, of course).

With the current memory I have, I can't make the problem much bigger
than the N=196,000 that I've been using.
(This is keeping the "use 80% of your memory" HPL rule of thumb.)
Maybe N can grow a bit more, but not a lot,
as I am close to trigger memory swapping already.

So far varying the HPL parameters, using processor affinity, etc,
haven't shown significant improvement.
The NB, P, Q, sweet spots are clear.
I have not tried other compilers, though, only Gnu, with optimization
flags appropriate for Opteron Shanghai.

I wonder if I reached the HPL "saturation point" for this memory size.
I wonder also if the 24 nodes had full 128GB/node RAM,
which would give me a max problem size N=554,000
(and a really long walltime to run HPL!),
there would be a significant increase in performance.

What do you think?

Has anybody done HPL benchmarks done with nodes "full of memory"? :)

Thank you,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] evaluating FLOPS capacity of our cluster

Reply via email to