On 2/27/25 3:19 AM, Brice Goglin wrote:
Hello
While meeting vendors to buy our next cluster, we got different
recommendations about the network for MPI. The cluster will likely be
about 100 nodes. Some vendors claim RoCE is enough to get <2us latency
and good bandwidth for such low numbers of nodes. Some others say RoCE
is far behind IB for both latency and bandwidth and we likely need to
get IB if we care about network performance.
If anybody tried MPI over RoCE over such a "small" cluster, what NICs
and switches did you use?
Also, is the configuration easy from the admin (installation) and
users (MPI options) points of view?
I hope this isn't a dumb question: Do the Ethernet switches you're
looking at have crossbar switches inside them? I believe crossbar
switches are a requirement for IB, but are only found in "higher
performance" Ethernet switches. IB isn't just about latency. The
crossbar switches allow for high bisectional bandwidth, non-blocking
communication, etc.
--
Prentice
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf