Le 27/02/2025 à 18:28, Prentice Bisbal a écrit :
On 2/27/25 3:19 AM, Brice Goglin wrote:
Hello
While meeting vendors to buy our next cluster, we got different
recommendations about the network for MPI. The cluster will likely be
about 100 nodes. Some vendors claim RoCE is enough to get <2us
latency and good bandwidth for such low numbers of nodes. Some others
say RoCE is far behind IB for both latency and bandwidth and we
likely need to get IB if we care about network performance.
If anybody tried MPI over RoCE over such a "small" cluster, what NICs
and switches did you use?
Also, is the configuration easy from the admin (installation) and
users (MPI options) points of view?
I hope this isn't a dumb question: Do the Ethernet switches you're
looking at have crossbar switches inside them? I believe crossbar
switches are a requirement for IB, but are only found in "higher
performance" Ethernet switches. IB isn't just about latency. The
crossbar switches allow for high bisectional bandwidth, non-blocking
communication, etc.
I don't know but that's a good question, I will ask vendors.
Brice
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf