Rahul Nabar wrote:
On Mon, May 11, 2009 at 12:23 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:
If you don't feel like running the HPL benchmark (It is fun,
but time consuming) to get your actual Gigaflops
(Rmax in Top500 jargon),
you can look up the Top500 list the Rmax/Rpeak ratio for clusters
with hardware similar to yours.
You can then apply this factor to your Rpeak calculated as above,
to get a reasonable guess for your Rmax.
This may be good enough for the purpose you mentioned.
Rmax/Rpeak= 0.83 seems a good guess based on one very similar system
on the Top500.
Thus I come up with a number of around 1.34 TeraFLOPS for my cluster
of 24 servers. Does the value seem reasonable ballpark? Nothing too
accurate but I do not want to be an order of magnitude off. [maybe a
decimal mistake in math! ]
Hardware:
Dell PowerEdgeSC1345's. All 64 bit machines with a dual channel
bonded Gigabit ethernet interconnect. AMD Quad-Core AMD Opteron(tm)
Processor 2354.
PS. The Athelon was my typo, earlier sorry!
Hi Rahul, list
You may have read my other posting with the
actual HPL Rmax/Rpeak = 83.4%
I measured here with AMD Quad-core 2376 (Shanghai).
This matches the number you found on Top500.
Our clusters are very similar, 24 nodes, 192 cores,
AMD 3rd generation (Barcelona and Shagnhai) right?
So ~83% is what you should expect if you use Infiniband (which I used),
on a cluster of this size and processors, with a single IB switch.
For Gigabit Ethernet I would guess the number is less.
The (nominal) bandwidth of Inifiniband III
is 20Gb/s = 20 x 1 GigE (IIRR it the factor is actually 16, not 20).
I have yet to try HPL over GigE, to have numbers to compare,
but it just takes too long to run HPL with a decent range of parameters,
and I am reluctant to do it, and stop production.
However, if the computation/communication ratio of your real
computational chemistry application is high,
the interconnect may not be so important, GigE may be perfectly good,
I would guess.
I.e. give enough numbers for each core to crunch (increase computation),
rather than splitting the task across too many of them (decrease
communication).
If you can run each job on a single node (shared memory) even better.
Well, this is as long as you have enough RAM on each node
to fit your job on a single node, without triggering
memory swapping.
My single-node HPL test gave Rmax/Rpeak=84.6%.
(I have yet to try it processor affinity turned on,
which may be a little better.)
So, one node was just a bit better than the 83.4%
across the whole cluster.
However, the difference may be larger if using GigE instead of IB
on the cluster.
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf