For memory bandwidth, single node tests such as Likwid are helpful
https://github.com/RRZE-HPC/likwid
MPI communication benchmarks are a good complement to this.
Full applications do more than the above, but these are easier starting
points that require less domain specific application knowledge for
general performance measurement.
On 3/19/22 3:58 AM, Richard Walsh wrote:
J,
Trying to add a bit to the preceding useful answers …
In my experience running these codes on very large systems for
acceptances, to get optimal (HPCG or HPL) performance on GPUs (MI200 or
A100) you need to obtain the optimized versions from the vendors which
include scripts with ENV variable tunings specific the their versions
and optimal affinity settings to manage the non-simple relationship
between the NICs, the GPUs, and CPUs … you have iterate through the
settings to find optimal settings for you system.
If you set out to do this on your own, the chances of getting values
similar to those posted on the TOP500 website are vanishingly small …
As already noted, buyers of large HPC systems almost always require
large scale runs of both HPCG (to demonstrate peak bandwidth) and HPL
(to demonstrated peak processor) performance.
Cheers!
rbw
Sent from my iPhone
On Mar 18, 2022, at 7:35 PM, Massimiliano Fatica <mfat...@gmail.com>
wrote:
HPCG measures memory bandwidth, the FLOPS capability of the chip is
completely irrelevant.
Pretty much all the vendor implementations reach very similar
efficiency if you compare them to the available memory bandwidth.
There is some effect of the network at scale, but you need to have a
really large system to see it in play.
M
On Fri, Mar 18, 2022 at 5:20 PM Brian Dobbins <bdobb...@gmail.com
<mailto:bdobb...@gmail.com>> wrote:
Hi Jorg,
We (NCAR - weather/climate applications) tend to find that HPCG
more closely tracks the performance we see from hardware than
Linpack, so it definitely is of interest and watched, but our
procurements tend to use actual code that vendors run as part of
the process, so we don't 'just' use published HPCG numbers.
Still, I'd say it's still very much a useful number, though.
As one example, while I haven't seen HPCG numbers for the MI250x
accelerators, Prof. Matuoka of RIKEN tweeted back in November that
he anticipated that to score around 0.4% of peak on HPCG, vs 2% on
the NVIDIA A100 (while the A64FX they use hits an impressive 3%):
https://twitter.com/ProfMatsuoka/status/1458159517590384640
<https://twitter.com/ProfMatsuoka/status/1458159517590384640>
Why is that relevant? Well, /on paper/, the MI250X has ~96 TF
FP64 w/ Matrix operations, vs 19.5 TF on the A100. So, 5x in
theory, but Prof Matsuoka anticipated a ~5x differential in HPCG,
/erasing/ that differential. Now, surely /someone/ has HPCG
numbers on the MI250X, but I've not yet seen any. Would love to
know what they are. But absent that information I tend to bet
Matsuoka isn't far off the mark.
Ultimately, it may help knowing more about what kind of
applications you run - for memory bound CFD-like codes, HPCG tends
to be pretty representative.
Maybe it's time to update the saying that 'numbers never lie' to
something more accurate - 'numbers never lie, but they also rarely
tell the whole story'.
Cheers,
- Brian
On Fri, Mar 18, 2022 at 5:08 PM Jörg Saßmannshausen
<sassy-w...@sassy.formativ.net
<mailto:sassy-w...@sassy.formativ.net>> wrote:
Dear all,
further the emails back in 2020 around the HPCG benchmark
test, as we are in
the process of getting a new cluster I was wondering if
somebody else in the
meantime has used that test to benchmark the particular
performance of the
cluster.
From what I can see, the latest HPCG version is 3.1 from
August 2019. I also
have noticed that their website has a link to download a
version which
includes the latest A100 GPUs from nVidia.
https://www.hpcg-benchmark.org/software/view.html?id=280
<https://www.hpcg-benchmark.org/software/view.html?id=280>
What I was wondering is: has anybody else apart from Prentice
tried that test
and is it somehow useful, or does it just give you another set
of numbers?
Our new cluster will not be at the same league as the
supercomputers, but we
would like to have at least some kind of handle so we can
compare the various
offers from vendors. My hunch is the benchmark will somehow
(strongly?) depend
on how it is tuned. As my former colleague used to say: I am
looking for some
war stories (not very apt to say these days!).
Either way, I hope you are all well given the strange new
world we are living
in right now.
All the best from a spring like dark London
Jörg
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
<mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
<https://beowulf.org/cgi-bin/mailman/listinfo/beowulf>
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
<mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
<https://beowulf.org/cgi-bin/mailman/listinfo/beowulf>
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf