Please keep the list updated on what you find, On Mon, 31 Jul 2023 at 20:08, Andrew Falgout <andrew.falg...@gmail.com> wrote:
> Not ignoring you guys, literally have been moving. We had to downsize, > I've got no power and even my main computer is still powered off. There's > nothing more eerie than a quiet computer room. I'm on an old laptop that I > threw Linux Mint 21 on to be here now. Okay.. to introduce myself a bit > more. > I've been doing linux for a long time, but been in a silo for a long > time. I feel like I've not used so many skills that I can't trust them > anymore. So I'm mentally just marking my cache as dirty and going to > relearn as much as I can. Great information so far. > > I have hardware and storage space to play with. (Dell R930 112 core/600gb > of ram) The issue is getting a graphics card in them for compute is really > not proving to be ideal. I have about 4 of these machines, and I'd like to > play around with clustering. Learning how to properly and securely plan > and implement them. I've played around with docker, and have used multiple > docker servers with portainer. Next, when I get an electrician to install > power, is to try to setup a kubernetes cluster. > When I can get something with some decent compute (not this laptop), I'd > like to learn how to train a small llm model using the cluster if > possible. I know I can do a good bit slowly with the CPU. If I can get a > GPU in the mix, doing that to speed things up. > Again.. I would like to apologize for being quiet for so long. I'll try > to toss an "ack" in there from my phone if nothing else. > > > ./Andrew Falgout > KG5GRX > > > On Mon, Jul 31, 2023 at 6:10 AM John Hearns <hear...@gmail.com> wrote: > >> A quick ack would be nice. >> >> On Fri, 28 Jul 2023, 06:38 John Hearns, <hear...@gmail.com> wrote: >> >>> Andrew, the answer is very much yes. I guess you are looking at the >>> interface of 'traditional' HPC which uses workload schedulers and >>> Kubernetes style clusters which use containers. >>> Firstly I would ask if you are coming from the point of view of someone >>> who wants to build a cluster in your home or company using kit which you >>> already have. >>> Or are you a company which wants to set up an AI infrastructure? >>> >>> By the way, I think you are thinking on a CPU cluster and scaling out >>> using Beowulf concepts. >>> In that case you are looking at Horovod >>> https://github.com/horovod/horovod >>> One thing though - for AI applications it is common to deploy Beowulf >>> clusters which have servers with GPUs as part of their specification. >>> >>> >>> I think it will be clear to you soon that you will be overwhelmed with >>> options and opinions. >>> Firstly join the hpc.social community and introduce yourself on the >>> Slack channel introductions >>> I would start with the following resources: >>> >>> https://www.clustermonkey.net/ >>> https://www.nvidia.com/en-gb/data-center/bright-cluster-manager/ >>> https://catalog.ngc.nvidia.com/containers >>> https://openhpc.community/ >>> https://ciq.com/ >>> https://qlustar.com/ >>> >>> https://www.delltechnologies.com/asset/en-nz/products/ready-solutions/technical-support/omnia-solution-overview.pdf >>> https://omnia-doc.readthedocs.io/en/latest/index.html >>> >>> Does anyone know if the Bright Easy8 licenses are available? I would say >>> that building test cluster with Easy 8 would be the quickest way to get >>> some hands on experience. >>> >>> You should of course consider cloud providers: >>> https://aws.amazon.com/hpc/parallelcluster/ >>> >>> https://azure.microsoft.com/en-gb/solutions/high-performance-computing/#intro >>> https://cloud.google.com/solutions/hpc >>> https://go.oracle.com/LP=134426 >>> >>> >>> >>> >>> >>> >>> >>> On Fri, 28 Jul 2023 at 01:10, Andrew Falgout <andrew.falg...@gmail.com> >>> wrote: >>> >>>> So I'm interested to see if a Beowulf Cluster could be used for Machine >>>> Learning, LLM training, and LLM inference. Anyone know where a good entry >>>> point is for learning Beowulf Clustering? >>>> >>>> >>>> ./Andrew Falgout >>>> KG5GRX >>>> >>>> >>>> On Wed, Jul 26, 2023 at 8:39 AM Michael DiDomenico < >>>> mdidomeni...@gmail.com> wrote: >>>> >>>>> just a mailing list as far as i know. it used to get a lot more >>>>> traffic, but seems to have simmered down quite a bit >>>>> >>>>> On Tue, Jul 25, 2023 at 6:50 PM Andrew Falgout < >>>>> andrew.falg...@gmail.com> wrote: >>>>> > >>>>> > Just curious, do we have a discord channel, or just a mailing list? >>>>> > >>>>> > >>>>> > ./Andrew Falgout >>>>> > KG5GRX >>>>> > >>>>> > >>>>> > >>>>> > On Fri, Jul 21, 2023 at 9:12 AM Michael DiDomenico < >>>>> mdidomeni...@gmail.com> wrote: >>>>> >> >>>>> >> ugh, as someone who worked the front lines in the 00's i got front >>>>> row >>>>> >> seat to the interconnect mud slinging... but franky if they're >>>>> going >>>>> >> to come out of the gate with a product named "Ultra Ethernet", i >>>>> smell >>>>> >> a loser... :) (sarcasm...) >>>>> >> >>>>> >> >>>>> https://www.nextplatform.com/2023/07/20/ethernet-consortium-shoots-for-1-million-node-clusters-that-beat-infiniband/ >>>>> >> _______________________________________________ >>>>> >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin >>>>> Computing >>>>> >> To change your subscription (digest mode or unsubscribe) visit >>>>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf >>>>> >>>> _______________________________________________ >>>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin >>>> Computing >>>> To change your subscription (digest mode or unsubscribe) visit >>>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf >>>> >>>
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf