This article might be interesting here: https://www.dell.com/support/article/en-uk/sln319015/amd-rome-is-it-for-real-architecture-and-initial-hpc-performance?lang=en
And Hello Joshua. Long time no see. On Sun, 25 Oct 2020 at 23:11, Joshua Mora <joshua_m...@usa.net> wrote: > Reach out AMD, > they have specific instructions (including BIOS/OS settings) and even > binaries > on how to get the best performance. > Dont go try and error as is very time consuming. > BLIS has also multiple parameters as it has nested loops, so you could also > have to try multiple configurations to get the optimal performance. > Just reach to them. > > Joshua > > ------ Original Message ------ > Received: 04:30 PM CDT, 08/14/2020 > From: Richard Walsh <rbwcn...@gmail.com> > To: Beowulf List <beowulf@beowulf.org> > Subject: [Beowulf] Best case performance of HPL on EPYC 7742 processor ... > > > All, > > > > What have people achieved on this SKU on a single-node using the stock > > HPL 2.3 source... ?? > > > > I have seen a variety of performance claims even as high as 90% of its > > nominal > > per node peak of 4.608 TFLOPs. I can now get above 80% of peak, but not > > higher. > > I have heard that to get higher values special BIOS settings are > required, > > including > > the turning off SMT which allows the chip to turbo higher. Remember this > > is not the > > 7542 processor with 32 cores per chip and the same bandwidth per socket > as > > the > > 7742 which can turbo to over 100% of nominal peak for HPL. > > > > If people have gotten higher single node numbers ... what is your recipe > > ... ?? > > > > I am particularly interested in BIOS settings, and maybe surprise > settings > > in the HPL.dat file. Do higher performing runs require using close to > the > > maximum memory on the node ... ?? As this is single-node, I would not > > expect choice of MPI to make a difference > > > > To get to 80% with SMT on in the BIOS, I am building with an older Intel > > compiler and MKL that still recognizes the MKL_DEBUG_CPU_TYPE=5. > > Running so that the number of MPI ranks run on the node matches the > > number of CCXs seems ot give the best numbers. > > > > Following the tuning instructions from AMD for using BLIS and GCC for > > the build does not get me there. > > > > Thanks, > > > > Richard Walsh > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf