Re: [Beowulf] [External] anyone have modern interconnect metrics?

Prentice Bisbal via Beowulf Fri, 19 Jan 2024 18:40:24 -0800

Also ease of use with open-source products like OpenMPI,

I don't see this being an issue. OpenMPI will detect your differentinterconnects and start with the fastest and work its way to theslowest. OpenMPI has always "just worked" for me, regardless of thenetwork. The only issue is if it doesn't find IB, it will issue awarning that it's using Ethernet and that's probably not what you want,but that's easy to turn off in the central config file.

But vendors seem to think that high-end ethernet (100-400Gb) iscompetitive...

Call me a cynical old codger, but I would not be surprised if that'smore profitable for them, or they have other incentives to promoteEthernet instead of IB. Or if you prefer Hanlon's razor, maybe they justdon't know squat about IB so are selling you what they know.

Yes, someone is sure to say "don't try characterizing all that stuff -
it's your application's performance that matters!"  Alas, we're a generic
"any kind of research computing" organization, so there are thousandsof appsacross all possible domains.


<rant>

I agree with you. I've always hated the "it depends on your application"stock response in HPC. I think it's BS. Very few of us work in anenvironment where we support only a handful of applications with verysimilar characteristics. I say use standardized benchmarks that testspecific performance metrics (mem bandwidth or mem latency, etc.),first, and then use a few applications to confirm what you're seeingwith those benchmarks.


</rant>

Another interesting topic is that nodes are becoming many-core - anythoughts?

Core counts are getting too high to be of use in HPC. High core-countprocessors sound great until you realize that all those cores are nowcompeting for same memory bandwidth and network bandwidth, neither ofwhich increase with core-count.

Last April we were evaluating test systems from different vendors for acluster purchase. One of our test users does a lot of CFD simulationsthat are very sensitive to mem bandwidth. While he was getting a 50%speed up in AMD compared to Intel (which makes sense since AMDs require12 DIMM slots to be filled instead of Intel's 8), he asked us considerservers with LESS cores. Even with the AMDs, he was saturating thememory bandwidth before scaling to all the cores, causing hisperformance to plateau. For him, buying cheaper processors with lowercore-counts was better for him, since the savings would allow us to byadditional nodes, which would be more beneficial to him.

Alternatively, are there other places to ask? Reddit or something less"greybeard"?

I've been very disappointed with the "expertise" on the HPC-relatedsubreddits. Last time I lurked there, it seemed very amateurish/DIYoriented. For example, someone wanted to buy all the individualcomponents and build assemble their own nodes for an entire cluster attheir job. Can you imagine? Most of the replies were encouraging them todo so....

You might want to join the HPCSYSPROS Slack channel and ask there. HPCSYSPROS is an ACM SIG for HPC system admins that runs workshops everyyear at SC. click on the "Get Involved" link on this page:


https://sighpc-syspros.org/

--
Prentice


On 1/16/24 5:19 PM, Mark Hahn wrote:

Hi all,
Just wondering if any of you have numbers (or experience) with
modern high-speed COTS ethernet.

Latency mainly, but perhaps also message rate.  Also ease of use
with open-source products like OpenMPI, maybe Lustre?
Flexibility in configuring clusters in the >= 1k node range?

We have a good idea of what to expect from Infiniband offerings,
and are familiar with scalable network topologies.
But vendors seem to think that high-end ethernet (100-400Gb) iscompetitive...
For instance, here's an excellent study of Cray/HP Slingshot (non-COTS):
https://arxiv.org/pdf/2008.08886.pdf
(half rtt around 2 us, but this paper has great stuff aboutcongestion, etc)
Yes, someone is sure to say "don't try characterizing all that stuff -
it's your application's performance that matters!"  Alas, we're a generic
"any kind of research computing" organization, so there are thousandsof apps
across all possible domains.
Another interesting topic is that nodes are becoming many-core - anythoughts?
Alternatively, are there other places to ask? Reddit or something less"greybeard"?
thanks, mark hahn
McMaster U / SharcNET / ComputeOntario / DRI Alliance Canada

PS: the snarky name "NVidiband" just occurred to me; too soon?
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visithttps://beowulf.org/cgi-bin/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] [External] anyone have modern interconnect metrics?

Reply via email to