Hi Tim, many thanks for sharing your experiences here and sorry for my slow reply, I am currently on annual leave and thus don't check emails on a daily base.
The additional layer of complexity is probably the price you have to pay for the flexibility. What we are aiming for is having a more flexible system so we can move things around, similar to what you are doing. I might come back to this later if you don't mind. Regards Jörg Am Mittwoch, 1. Juli 2020, 12:13:05 BST schrieb Tim Cutts: > Here, we deploy some clusters on OpenStack, and some traditionally as bare > metal. Our largest cluster is actually a mixture of both, so we can > dynamically expand it from the OpenStack service when needed. > Our aim eventually is to use OpenStack as a common deployment layer, even > for the bare metal cluster nodes, but we’re not quite there yet. > The main motivation for this was to have a common hardware and deployment > platform, and have flexibility for VM and batch workloads. We have needed > to dynamically change workloads (for example in the current COVID-19 > crisis, our human sequencing has largely stopped and we’ve been > predominantly COVID-19 sequencing, using an imported pipeline from the > consortium we’re part of). Using OpenStack we could get that new pipeline > running in under a week, and later moved it from the research to the > production environment, reallocating research resources back to their > normal workload. > There certainly are downsides; OpenStack is a considerable layer of > complexity, and we have had occasional issues, although those rarely affect > established running VMs (such as batch clusters). Those occasional > problems are usually in the services for dynamically creating and > destroying resources, so they don’t have immediate impact on batch > clusters. Plus, we tend to use fairly static provider networks to connect > the Lustre systems to virtual clusters, which removes another layer of > OpenStack complexity. > Generally speaking it’s working pretty well, and we have uptimes of in > excess of 99.5% > Tim > > On 1 Jul 2020, at 05:09, John Hearns > <hear...@gmail.com<mailto:hear...@gmail.com>> wrote: > Jorg, I would back up what Matt Wallis says. What benefits would Openstack > bring you ? Do you need to set up a flexible infrastructure where clusters > can be created on demand for specific projects? > Regarding Infiniband the concept is SR-IOV. This article is worth reading: > https://docs.openstack.org/neutron/pike/admin/config-sriov.html > [docs.openstack.org]<https://urldefense.proofpoint.com/v2/url?u=https-3A__d > ocs.openstack.org_neutron_pike_admin_config-2Dsriov.html&d=DwMFaQ&c=D7ByGjS3 > 4AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=gSesY1AbeTURZwExR_OGFZlp9YUzrLWyYpGmwA > w4Q50&m=T0asmfOta_bLT2cXWrpERYigde5lOqHx2vVIH2WSIOw&s=VMHyCkd1eb1ztnzu4i617z > rYxnddfDUUEkn1u45xQq0&e=> > I would take a step back and look at your storage technology and which is > the best one to be going forward with. Also look at the proceeding sof the > last STFC Computing Insights where Martyn Guest presented a lot of > benchmarking results on AMD Rome > Page 103 onwards in this report > http://purl.org/net/epubs/manifestation/46387165/DL-CONF-2020-001.pdf > [purl.org]<https://urldefense.proofpoint.com/v2/url?u=http-3A__purl.org_net > _epubs_manifestation_46387165_DL-2DCONF-2D2020-2D001.pdf&d=DwMFaQ&c=D7ByGjS3 > 4AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=gSesY1AbeTURZwExR_OGFZlp9YUzrLWyYpGmwA > w4Q50&m=T0asmfOta_bLT2cXWrpERYigde5lOqHx2vVIH2WSIOw&s=GNtI2S6yacqAS4bpUYbfq4 > bDe8nv9gXksMXaqCqgbro&e=> > > > > On Tue, 30 Jun 2020 at 12:21, Jörg Saßmannshausen > <sassy-w...@sassy.formativ.net<mailto:sassy-w...@sassy.formativ.net>> > wrote: Dear all, > > we are currently planning a new cluster and this time around the idea was > to use OpenStack for the HPC part of the cluster as well. > > I was wondering if somebody has some first hand experiences on the list > here. One of the things we currently are not so sure about it is > InfiniBand (or another low latency network connection but not ethernet): > Can you run HPC jobs on OpenStack which require more than the number of > cores within a box? I am thinking of programs like CP2K, GROMACS, NWChem > (if that sounds familiar to you) which utilise these kind of networks very > well. > > I cam across things like MagicCastle from Computing Canada but as far as I > understand it, they are not using it for production (yet). > > Is anybody on here familiar with this? > > All the best from London > > Jörg > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org<mailto:Beowulf@beowulf.org> > sponsored by Penguin Computing To change your subscription (digest mode or > unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > [beowulf.org]<https://urldefense.proofpoint.com/v2/url?u=https-3A__beowulf. > org_cgi-2Dbin_mailman_listinfo_beowulf&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7q > lm8uclZFI0SqQnqBo&r=gSesY1AbeTURZwExR_OGFZlp9YUzrLWyYpGmwAw4Q50&m=T0asmfOta_ > bLT2cXWrpERYigde5lOqHx2vVIH2WSIOw&s=oVEwBKwlVDhzh5JPMjRBZxSAaRPRnCoMIkT-73oO > NAo&e=> _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org<mailto:Beowulf@beowulf.org> > sponsored by Penguin Computing To change your subscription (digest mode or > unsubscribe) visit > https://urldefense.proofpoint.com/v2/url?u=https-3A__beowulf.org_cgi-2Dbin_ > mailman_listinfo_beowulf&d=DwIGaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQn > qBo&r=gSesY1AbeTURZwExR_OGFZlp9YUzrLWyYpGmwAw4Q50&m=T0asmfOta_bLT2cXWrpERYig > de5lOqHx2vVIH2WSIOw&s=oVEwBKwlVDhzh5JPMjRBZxSAaRPRnCoMIkT-73oONAo&e= _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf