Jorg, I think I might know where the Lustre storage is ! It is possible to install storage routers, so you could route between ethernet and infiniband. It is also worth saying that Mellanox have Metro Infiniband switches - though I do not think they go as far as the west of London!
Seriously though , you ask about RoCE. I will stick my neck out and say yes, if you are planning an Openstack cluster with the intention of having mixed AI and 'traditional' HPC workloads I would go for a RoCE style setup. In fact I am on a discussion about a new project for a customer with similar aims in an hours time. I could get some benchmarking time if you want to do a direct comparison of Gromacs on IB / RoCE On Thu, 26 Nov 2020 at 11:14, Jörg Saßmannshausen < sassy-w...@sassy.formativ.net> wrote: > Dear all, > > as the DNS problems have been solve (many thanks for doing this!), I was > wondering if people on the list have some experiences with this question: > > We are currently in the process to purchase a new cluster and we want to > use > OpenStack for the whole management of the cluster. Part of the cluster > will > run HPC applications like GROMACS for example, other parts typical > OpenStack > applications like VM. We also are implementing a Data Safe Haven for the > more > sensitive data we are aiming to process. Of course, we want to have a > decent > size GPU partition as well! > > Now, traditionally I would say that we are going for InfiniBand. However, > for > reasons I don't want to go into right now, our existing file storage > (Lustre) > will be in a different location. Thus, we decided to go for RoCE for the > file > storage and InfiniBand for the HPC applications. > > The point I am struggling is to understand if this is really the best of > the > solution or given that we are not building a 100k node cluster, we could > use > RoCE for the few nodes which are doing parallel, read MPI, jobs too. > I have a nagging feeling that I am missing something if we are moving to > pure > RoCE and ditch the InfiniBand. We got a mixed workload, from ML/AI to MPI > applications like GROMACS to pipelines like they are used in the > bioinformatic > corner. We are not planning to partition the GPUs, the current design > model is > to have only 2 GPUs in a chassis. > So, is there something I am missing or is the stomach feeling I have > really a > lust for some sushi? :-) > > Thanks for your sentiments here, much welcome! > > All the best from a dull London > > Jörg > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf