Andrew, the answer is very much yes. I guess you are looking at the interface of 'traditional' HPC which uses workload schedulers and Kubernetes style clusters which use containers. Firstly I would ask if you are coming from the point of view of someone who wants to build a cluster in your home or company using kit which you already have. Or are you a company which wants to set up an AI infrastructure?
By the way, I think you are thinking on a CPU cluster and scaling out using Beowulf concepts. In that case you are looking at Horovod https://github.com/horovod/horovod One thing though - for AI applications it is common to deploy Beowulf clusters which have servers with GPUs as part of their specification. I think it will be clear to you soon that you will be overwhelmed with options and opinions. Firstly join the hpc.social community and introduce yourself on the Slack channel introductions I would start with the following resources: https://www.clustermonkey.net/ https://www.nvidia.com/en-gb/data-center/bright-cluster-manager/ https://catalog.ngc.nvidia.com/containers https://openhpc.community/ https://ciq.com/ https://qlustar.com/ https://www.delltechnologies.com/asset/en-nz/products/ready-solutions/technical-support/omnia-solution-overview.pdf https://omnia-doc.readthedocs.io/en/latest/index.html Does anyone know if the Bright Easy8 licenses are available? I would say that building test cluster with Easy 8 would be the quickest way to get some hands on experience. You should of course consider cloud providers: https://aws.amazon.com/hpc/parallelcluster/ https://azure.microsoft.com/en-gb/solutions/high-performance-computing/#intro https://cloud.google.com/solutions/hpc https://go.oracle.com/LP=134426 On Fri, 28 Jul 2023 at 01:10, Andrew Falgout <andrew.falg...@gmail.com> wrote: > So I'm interested to see if a Beowulf Cluster could be used for Machine > Learning, LLM training, and LLM inference. Anyone know where a good entry > point is for learning Beowulf Clustering? > > > ./Andrew Falgout > KG5GRX > > > On Wed, Jul 26, 2023 at 8:39 AM Michael DiDomenico <mdidomeni...@gmail.com> > wrote: > >> just a mailing list as far as i know. it used to get a lot more >> traffic, but seems to have simmered down quite a bit >> >> On Tue, Jul 25, 2023 at 6:50 PM Andrew Falgout <andrew.falg...@gmail.com> >> wrote: >> > >> > Just curious, do we have a discord channel, or just a mailing list? >> > >> > >> > ./Andrew Falgout >> > KG5GRX >> > >> > >> > >> > On Fri, Jul 21, 2023 at 9:12 AM Michael DiDomenico < >> mdidomeni...@gmail.com> wrote: >> >> >> >> ugh, as someone who worked the front lines in the 00's i got front row >> >> seat to the interconnect mud slinging... but franky if they're going >> >> to come out of the gate with a product named "Ultra Ethernet", i smell >> >> a loser... :) (sarcasm...) >> >> >> >> >> https://www.nextplatform.com/2023/07/20/ethernet-consortium-shoots-for-1-million-node-clusters-that-beat-infiniband/ >> >> _______________________________________________ >> >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin >> Computing >> >> To change your subscription (digest mode or unsubscribe) visit >> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf >> > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf