Thanks for the explanation. I've always found the documentation on HPCG to be lacking, and what I remember reading about it said it's supposed to be a more holistic approach to benchmarking which I assumed meant it stressed the whole system, not just one subsystem.

I'll do a search for presentations from the BOFs. If you can send me the PDF you referenced below, I will be grateful.

Prentice

On 3/21/22 8:42 PM, Massimiliano Fatica wrote:
No, HPCG  is all memory bandwidth.
You can see this old presentation where GPUs with basically no double precision, perform on par with others with 10x performance.

http://www.hpcg-benchmark.org/downloads/sc14/HPCG_BOF.pdf

There were more examples during recent HPCG BOFs ( but I can't find the pdf online, if you want I can send them to you). For example, if you look at the specs of a K80 ( 2xGK210 , 1.4TF DP and 384 bit memory bus  at 5GHz ) and M40 (GM200, 0.2TF DP and 384 bit memory bus  at 6GHz), you may think that the K80 will much faster. Exactly the opposite, and the results scale perfectly with memory bandwidth.

*1 x K80 (2 GK210 GPUs), ECC enabled, clk=875*
2x1x1 process grid
256x256x256 local domain
SpMV = 49.1 GF ( 309.1 GB/s Effective) 24.5 GF_per ( 154.6 GB/s Effective) SymGS = 62.2 GF ( 480.2 GB/s Effective) 31.1 GF_per ( 240.1 GB/s Effective) total = 58.7 GF ( 445.3 GB/s Effective) 29.4 GF_per ( 222.7 GB/s Effective) final = 55.1 GF ( 417.5 GB/s Effective) 27.5 GF_per ( 208.8 GB/s Effective)

*2 x M40 (2 GM200 GPUs), ECC enabled, clk=1114*
2x1x1 process grid
256x256x256 local domain
SpMV = 69.4 GF ( 437.2 GB/s Effective) 34.7 GF_per ( 218.6 GB/s Effective) SymGS = 83.7 GF ( 645.7 GB/s Effective) 41.8 GF_per ( 322.8 GB/s Effective) total = 79.6 GF ( 603.7 GB/s Effective) 39.8 GF_per ( 301.9 GB/s Effective) final = 74.2 GF ( 562.7 GB/s Effective) 37.1 GF_per ( 281.4 GB/s Effective)

Regarding Linpack, on CPU systems  the trailing matrix update is slow, you can hide all the network traffic with the look-ahead if you have a decent network (most CPU-only systems on the list are not real  HPC systems, just some OEMs stuffing the list with cloud systems with very poor network). On accelerated systems ( for example GPU), network becomes really critical.

Now, memory bw is the real limitation in most HPC workloads, so if I had to select a system, I would care more about memory bw than HPL.

M


On Mon, Mar 21, 2022 at 11:51 AM Prentice Bisbal via Beowulf <beowulf@beowulf.org> wrote:

    M,

    Isn't it more accurate to say that HPCG measures the whole system
    more realistically, and memory bandwidth happens to be the "rate
    limiting step" in just about all architectures? Even with LINPACK,
    which should be CPU-bound, the Top500 list shows that HPL results
    are affected by the network. For example, there's this article
    which is a bit old, but I think still applies (doing the same
    analysis on the current top500 list is on my to-do list, actually):

    
https://www.nextplatform.com/2015/07/20/ethernet-will-have-to-work-harder-to-win-hpc/

    On 3/18/22 8:34 PM, Massimiliano Fatica wrote:
    HPCG measures memory bandwidth, the FLOPS capability of the chip
    is completely irrelevant.
    Pretty much all the vendor implementations reach very similar
    efficiency if you compare them to the available memory bandwidth.
    There is some effect of the network at scale, but you need to
    have a really large  system to see it in play.

    M

    On Fri, Mar 18, 2022 at 5:20 PM Brian Dobbins
    <bdobb...@gmail.com> wrote:


        Hi Jorg,

          We (NCAR - weather/climate applications) tend to find that
        HPCG more closely tracks the performance we see from hardware
        than Linpack, so it definitely is of interest and watched,
        but our procurements tend to use actual code that vendors run
        as part of the process, so we don't 'just' use published HPCG
        numbers.  Still, I'd say it's still very much a useful
        number, though.

          As one example, while I haven't seen HPCG numbers for the
        MI250x accelerators, Prof. Matuoka of RIKEN tweeted back in
        November that he anticipated that to score around 0.4% of
        peak on HPCG, vs 2% on the NVIDIA A100 (while the A64FX they
        use hits an impressive 3%):
        https://twitter.com/ProfMatsuoka/status/1458159517590384640

          Why is that relevant?  Well, /on paper/, the MI250X has ~96
        TF FP64 w/ Matrix operations, vs 19.5 TF on the A100.  So, 5x
        in theory, but Prof Matsuoka anticipated a ~5x differential
        in HPCG, /erasing/ that differential.  Now, surely /someone/
        has HPCG numbers on the MI250X, but I've not yet seen any. 
        Would love to know what they are.  But absent that
        information I tend to bet Matsuoka isn't far off the mark.

          Ultimately, it may help knowing more about what kind of
        applications you run - for memory bound CFD-like codes, HPCG
        tends to be pretty representative.

          Maybe it's time to update the saying that 'numbers never
        lie' to something more accurate - 'numbers never lie, but
        they also rarely tell the whole story'.

          Cheers,
          - Brian


        On Fri, Mar 18, 2022 at 5:08 PM Jörg Saßmannshausen
        <sassy-w...@sassy.formativ.net> wrote:

            Dear all,

            further the emails back in 2020 around the HPCG benchmark
            test, as we are in
            the process of getting a new cluster I was wondering if
            somebody else in the
            meantime has used that test to benchmark the particular
            performance of the
            cluster.
            From what I can see, the latest HPCG version is 3.1 from
            August 2019. I also
            have noticed that their website has a link to download a
            version which
            includes the latest A100 GPUs from nVidia.
            https://www.hpcg-benchmark.org/software/view.html?id=280

            What I was wondering is: has anybody else apart from
            Prentice tried that test
            and is it somehow useful, or does it just give you
            another set of numbers?

            Our new cluster will not be at the same league as the
            supercomputers, but we
            would like to have at least some kind of handle so we can
            compare the various
            offers from vendors. My hunch is the benchmark will
            somehow (strongly?) depend
            on how it is tuned. As my former colleague used to say: I
            am looking for some
            war stories (not very apt to say these days!).

            Either way, I hope you are all well given the strange new
            world we are living
            in right now.

            All the best from a spring like dark London

            Jörg



            _______________________________________________
            Beowulf mailing list, Beowulf@beowulf.org sponsored by
            Penguin Computing
            To change your subscription (digest mode or unsubscribe)
            visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

        _______________________________________________
        Beowulf mailing list, Beowulf@beowulf.org sponsored by
        Penguin Computing
        To change your subscription (digest mode or unsubscribe)
        visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


    _______________________________________________
    Beowulf mailing list,Beowulf@beowulf.org  sponsored by Penguin Computing
    To change your subscription (digest mode or unsubscribe) 
visithttps://beowulf.org/cgi-bin/mailman/listinfo/beowulf
    _______________________________________________
    Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin
    Computing
    To change your subscription (digest mode or unsubscribe) visit
    https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Reply via email to