A quick ack would be nice. On Fri, 28 Jul 2023, 06:38 John Hearns, <hear...@gmail.com> wrote:
> Andrew, the answer is very much yes. I guess you are looking at the > interface of 'traditional' HPC which uses workload schedulers and > Kubernetes style clusters which use containers. > Firstly I would ask if you are coming from the point of view of someone > who wants to build a cluster in your home or company using kit which you > already have. > Or are you a company which wants to set up an AI infrastructure? > > By the way, I think you are thinking on a CPU cluster and scaling out > using Beowulf concepts. > In that case you are looking at Horovod https://github.com/horovod/horovod > One thing though - for AI applications it is common to deploy Beowulf > clusters which have servers with GPUs as part of their specification. > > > I think it will be clear to you soon that you will be overwhelmed with > options and opinions. > Firstly join the hpc.social community and introduce yourself on the Slack > channel introductions > I would start with the following resources: > > https://www.clustermonkey.net/ > https://www.nvidia.com/en-gb/data-center/bright-cluster-manager/ > https://catalog.ngc.nvidia.com/containers > https://openhpc.community/ > https://ciq.com/ > https://qlustar.com/ > > https://www.delltechnologies.com/asset/en-nz/products/ready-solutions/technical-support/omnia-solution-overview.pdf > https://omnia-doc.readthedocs.io/en/latest/index.html > > Does anyone know if the Bright Easy8 licenses are available? I would say > that building test cluster with Easy 8 would be the quickest way to get > some hands on experience. > > You should of course consider cloud providers: > https://aws.amazon.com/hpc/parallelcluster/ > > https://azure.microsoft.com/en-gb/solutions/high-performance-computing/#intro > https://cloud.google.com/solutions/hpc > https://go.oracle.com/LP=134426 > > > > > > > > On Fri, 28 Jul 2023 at 01:10, Andrew Falgout <andrew.falg...@gmail.com> > wrote: > >> So I'm interested to see if a Beowulf Cluster could be used for Machine >> Learning, LLM training, and LLM inference. Anyone know where a good entry >> point is for learning Beowulf Clustering? >> >> >> ./Andrew Falgout >> KG5GRX >> >> >> On Wed, Jul 26, 2023 at 8:39 AM Michael DiDomenico < >> mdidomeni...@gmail.com> wrote: >> >>> just a mailing list as far as i know. it used to get a lot more >>> traffic, but seems to have simmered down quite a bit >>> >>> On Tue, Jul 25, 2023 at 6:50 PM Andrew Falgout <andrew.falg...@gmail.com> >>> wrote: >>> > >>> > Just curious, do we have a discord channel, or just a mailing list? >>> > >>> > >>> > ./Andrew Falgout >>> > KG5GRX >>> > >>> > >>> > >>> > On Fri, Jul 21, 2023 at 9:12 AM Michael DiDomenico < >>> mdidomeni...@gmail.com> wrote: >>> >> >>> >> ugh, as someone who worked the front lines in the 00's i got front row >>> >> seat to the interconnect mud slinging... but franky if they're going >>> >> to come out of the gate with a product named "Ultra Ethernet", i smell >>> >> a loser... :) (sarcasm...) >>> >> >>> >> >>> https://www.nextplatform.com/2023/07/20/ethernet-consortium-shoots-for-1-million-node-clusters-that-beat-infiniband/ >>> >> _______________________________________________ >>> >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin >>> Computing >>> >> To change your subscription (digest mode or unsubscribe) visit >>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf >>> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf >> >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf