Dear all, same here, I should have joined the discussion earlier but currently I am recovering from a trapped ulnaris nerve OP, so long typing is something I need to avoid. As it is quite apt I think, I would like to inform you about this upcoming talk (copy&pasta):
********** *Performance Optimizations & Best Practices for AMD Rome and Milan CPUs in HPC Environments* - date & time: Fri July 2nd 2021 - 16:00-17:30 UTC - speakers: Evan Burness and Jithin Jose (Principal Program Managers for High- Performance Computing in Microsoft Azure) More information available at https://github.com/easybuilders/easybuild/wiki/ EasyBuild-tech-talks-IV:-AMD-Rome-&-Milan The talk will be presented via a Zoom session, which registered attendees can join, and will be streamed (+ recorded) via the EasyBuild YouTube channel. Q&A via the #tech-talks channel in the EasyBuild Slack. Please register (free or charge) if you plan to attend, via: https://webappsx.ugent.be/eventManager/events/ebtechtalkamdromemilan The Zoom link will only be shared with registered attendees. ********** These talks are really tech talks and not sales talks and all of the ones I been to were very informative and friendly. So that might be a good idea to ask some questions there? All the best Jörg Am Sonntag, 20. Juni 2021, 18:28:25 BST schrieb Mikhail Kuzminsky: > I apologize - I should have written earlier, but I don't always work > with my broken right hand. It seems to me that a reasonable basis for > discussing AMD EPYC performance could be the specified performance > data in the Daresburg University benchmark from M.Guest. Yes, newer > versions of AMD EPYC and Xeon Scalable processors have appeared since > then, and new compiler versions. However, Intel already had AVX-512 > support, and AMD - AVX-256. > Of course, peak performanceis is not so important as application > performance. There are applications where performance is not limited > to working with vectors - there AVX-512 may not be needed. And in AI > tasks, working with vectors is actual - and GPUs are often used there. > For AI, the Daresburg benchmark, on the other hand, is less relevant. > And in Zen 4, AMD seemed to be going to support 512 bit vectors. But > performance of linear algebra does not always require work with GPU. > In quantum chemistry, you can get acceleration due to vectors on the > V100, let's say a 2 times - how much more expensive is the GPU? > Of course, support for 512 bit vectors is a plus, but you really need > to look to application performance and cost (including power > consumption). I prefer to see to the A64FX now, although there may > need to be rebuild applications. Servers w/A64FX sold now, but the > price is very important. > > In message from John Hearns <hear...@gmail.com> (Sun, 20 Jun 2021 > > 06:38:06 +0100): > > Regarding benchmarking real world codes on AMD , every year Martyn > > > >Guest > > > > presents a comprehensive set of benchmark studies to the UK Computing > > Insights Conference. > > I suggest a Sunday afternoon with the beverage of your choice is a > > > >good > > > > time to settle down and take time to read these or watch the > > > >presentation. > > > > 2019 > > https://www.scd.stfc.ac.uk/SiteAssets/Pages/CIUK-2019-Presentations/Martyn > > _Guest.pdf > > > > > > 2020 Video session > > https://ukri.zoom.us/rec/share/ajvsxdJ8RM1wzpJtnlcypw4OyrZ9J27nqsfAG7eW49E > > hq_Z5igat_7gj21Ge8gWu.78Cd9I1DNIjVViPV?startTime=1607008552000 > > > > Skylake / Cascade Lake / AMD Rome > > > > The slides for 2020 do exist - as I remember all the slides from all > > > >talks > > > > are grouped together, but I cannot find them. > > Watch the video - it is an excellent presentation. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, 19 Jun 2021 at 16:49, Gerald Henriksen <ghenr...@gmail.com> > > > >wrote: > >> On Wed, 16 Jun 2021 13:15:40 -0400, you wrote: > >> >The answer given, and I'm > >> >not making this up, is that AMD listens to their users and gives the > >> >users what they want, and right now they're not hearing any demand > >> > >>for > >> > >> >AVX512. > >> > > >> >Personally, I call BS on that one. I can't imagine anyone in the HPC > >> >community saying "we'd like processors that offer only 1/2 the > >> > >>floating > >> > >> >point performance of Intel processors". > >> > >> I suspect that is marketing speak, which roughly translates to not > >> that no one has asked for it, but rather requests haven't reached a > >> threshold where the requests are viewed as significant enough. > >> > >> > Sure, AMD can offer more cores, > >> > > >> >but with only AVX2, you'd need twice as many cores as Intel > >> > >>processors, > >> > >> >all other things being equal. > >> > >> But of course all other things aren't equal. > >> > >> AVX512 is a mess. > >> > >> Look at the Wikipedia page(*) and note that AVX512 means different > >> things depending on the processor implementing it. > >> > >> So what does the poor software developer target? > >> > >> Or that it can for heat reasons cause CPU frequency reductions, > >> meaning real world performance may not match theoritical - thus > >> > >>easier > >> > >> to just go with GPU's. > >> > >> The result is that most of the world is quite happily (at least for > >> now) ignoring AVX512 and going with GPU's as necessary - particularly > >> given the convenient libraries that Nvidia offers. > >> > >> > I compared a server with dual AMD EPYC >7H12 processors (128) > >> > quad Intel Xeon 8268 >processors (96 cores). > >> > > >> > From what I've heard, the AMD processors run much hotter than the > >> > >>Intel > >> > >> >processors, too, so I imagine a FLOPS/Watt comparison would be even > >> > >>less > >> > >> >favorable to AMD. > >> > >> Spec sheets would indicate AMD runs hotter, but then again you > >> benchmarked twice as many Intel processors. > >> > >> So, per spec sheets for you processors above: > >> > >> AMD - 280W - 2 processors means system 560W > >> Intel - 205W - 4 processors means system 820W > >> > >> (and then you also need to factor in purchase price). > >> > >> >An argument can be made that for calculations that lend themselves > >> > >>to > >> > >> >vectorization should be done on GPUs, instead of the main processors > >> > >>but > >> > >> >the last time I checked, GPU jobs are still memory is limited, and > >> >moving data in and out of GPU memory can still take time, so I can > >> > >>see > >> > >> >situations where for large amounts of data using CPUs would be > >> > >>preferred > >> > >> >over GPUs. > >> > >> AMD's latest chips support PCI 4 while Intel is still stuck on PCI 3, > >> which may or may not mean a difference. > >> > >> But what despite all of the above and the other replies, it is AMD > >> > >>who > >> > >> has been winning the HPC contracts of late, not Intel. > >> > >> * - https://en.wikipedia.org/wiki/Advanced_Vector_Extensions > >> _______________________________________________ > >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin > >> > >>Computing > >> > >> To change your subscription (digest mode or unsubscribe) visit > >> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf