Jon, ROCE is commonly used. We run GPFS over ROCE and plenty of other sites do also.
To answer questions on what network ROCE needs, I guess you could run it on a 1 Gbps network with office grade network switches. What it really needs is a lossless network. Dare I saw the Mellanox word.... I think you would find ROCE is a lot more prevalent than you would think... I guess we should brin in GPUdirect and NVME over Fabrics here. Google finds this website: http://www.roceinitiative.org/ On 21 September 2017 at 07:02, Jon Tegner <teg...@renget.se> wrote: > What about RoCE? Is this something that is commonly used (I would guess no > since I have not found much)? Are there other protocols that are worth > considering (like "gamma" which doesn't seem to be developed anymore)? > > My impression is that with RoCE you have to use specialized hardware > (unlike gamma - where one could use standard hardware, and still get a > noticeable improvement in latency)? > > Thoughts? > > /jon > > On 09/21/2017 04:09 AM, Christopher Samuel wrote: > > Thanks Peter for the high level overview! A few followup questions. What > if I am using a non-Infiniband cluster, i.e something with 10gigE. Or > even slower like at my home I have a raspbery pi cluster with 100 Mbps > ethernet. Is ofed/psm/verbs all irrelevant? > > Pretty much, yes, unless you've got fancy switches that can do RoCE. > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf