Hi Tom, Greg, Rahul, list
Tom Elken wrote:
On Behalf Of Rahul Nabar
Rmax/Rpeak= 0.83 seems a good guess based on one very similar system
on the Top500.
Thus I come up with a number of around 1.34 TeraFLOPS for my cluster
of 24 servers. Does the value seem reasonable ballpark? Nothing too
accurate but I do not want to be an order of magnitude off. [maybe a
decimal mistake in math! ]
You're in the right ballpark.
I recently got 0.245 Tflops on HPL on a 4-node version of what you have
(with Goto BLAS), so 6x that # is in the same ballpark as your
1.34 TF/s estimate.
My CPUs were 2.3 GHz Opteron 2356 instead of your 2.2 GHz.
Greg is also right on the memory size being a factor allowing larger N
to be used for HPL.
I used a pretty small N on this HPL run since we were running it
> as part of a HPC Challenge suite run,
and a smaller N can be better for PTRANS if you are interested
in the non-HPL parts of HPCC (as I was).
I have 16GB/node, the maximum possible is 128GB for this motherboard.
I have tried only two problem sizes: N=50,000, and N=196,000,
which is approximately the maximum the cluster can run without
memory swap.
(HPL suggests aiming at 80% of memory, as a rule of thumb).
It is true that performance at large N (1.49Tflops, Rmax/Rpeak=83.6%)
is much better than at small N (1.23Tflops, Rmax/Rpeak=70%).
However, here is somebody that did an experiment with increasing
values of N, and his results suggest that performance increases
logarithmically with problem size (N), not linearly,
saturating when you get closer to the maximum possible for your
current memory size.
http://www.calvin.edu/~adams/research/microwulf/performance/HPL/
Of course, your memory size is how much you have, but could be as
large as your motherboard (and your budget) allows it to be.
Questions for the HPL experts:
Would I get a significant increase in performance if the nodes were
outfitted with the maximum of 128GB of RAM each,
instead of the current 16GB?
Would I get, say, Rmax/Rpeak=90% or better?
All 64 bit machines with a dual channel
bonded Gigabit ethernet interconnect. AMD Quad-Core AMD Opteron(tm)
Processor 2354.
As others have said, 50% is a more likely HPL efficiency for a large GigE
cluster, but with your smallish cluster (24 nodes) and bonded channels,
you would probably get closer to 80% than 50%.
Thank you.
That clarifies things a bit.
Are "bonded channels" what you get in a single switch?
So, it is "small is better", right? :)
How about Infiniband, would the same principle apply,
a small cluster with a single switch being more efficient than a large
one with stacked switches?
Thank you,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------
-Tom
PS. The Athelon was my typo, earlier sorry!
--
Rahul
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin
Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf