Thank you very much Greg,Douglas, John and Michael. You very kindly "overwhelmed" me, and I thank you for that, with hints of things I didn't know about. So my very next step will be to understand each of your hints, and I guess I will come back with some more practical questions about them. In the meanwhile, just to clarify something of my project: - for the hard-computing part I'm using C++ and for the web-server side golang - to speed up the hardest computing part after trying other tools, I came up to the idea that HPX, despite being quite complicated, can be of help - I would like to use Kafka as a distributed message-broker between the golang's web-server and the C++'s computing parts. And being essentially a distributed fault-tolerant append-only log, it can help in keeping the parts tuned "on the same music" - I've been using Ubuntu as OS and, if it's possible, I would like to keep using it also in the distributed environment.
Marco Il giorno dom 3 mar 2019 alle ore 18:10 Greg Keller <gregwkel...@gmail.com> ha scritto: > I Third OpenHPC, or at least the Warewulf underpinnings in it. > http://warewulf.lbl.gov/ > > For "learning" the software stack you may consider beefing up your current > node and running virtualized environment inside it? I use the community > version of Proxmox (https://www.proxmox.com/en/downloads). On Ubuntu > Virt-Manager+QEMU+KVM is equally capable but a bit less obvious for > configuring VMS & Containers. Running 3 nodes, each with 8GB RAM and > leaving 8GB for the host should be sufficient to get the software setup and > test the basic adminish stuff and strategy. > > The key things for a real cluster IMHO are: > 1) SSH Configuration - ssh keys for passwordless access to all compute > 2) a shared filesystem - NFS, Lustre, or for Virtual machines on severe > budget Plan-9 (https://en.wikipedia.org/wiki/9P_(protocol)). Maybe put > this NFS and a couple old disks an old Atom based machine you've been > holding the door open with. > 3) A capable scheduler, slurm being a current favorite but several tried > and true options that may be better for your specific project > 4) Systems management. Ram Based Filesystems like Warewulf supports are > great because a reboot ensures that any bit-rot on a "node" is fixed.... > especially if you format the local "scratch" hard disk on boot :). I see a > lot of ansible and other methods that seem popular but above my pea brain > or budget. > 5) parallel shells. I used PDSH a lot but several attempts have been made > over the years. You almost can't have too may ways to run in parallel. > 6) remote power control and consoles - IPMI/BMC or equivalent is a must > have when you scale up, but for the starter kit it would be good to have > too. Even some really low end Stuff has them these days and it's a feature > you'll quickly consider essential. For a COTS cluster without the built in > BMC, this looks promising.... https://github.com/Fmstrat/diy-ipmi > > Not really required, but I mention my good friends Screen and Byobu that > have saved my bacon many times when an unexpected disconnect (power / > network etc) of my client would have ravaged a system into an unknown state. > > Bonus points for folks who manage & Monitor the cluster. When something's > broke does the system tell you before the users? If yes, you have the > "Right Stuff" being monitored. > > For me the notion of clusters not being heterogeneous is overstated. > Assuming you compile on a given node (A Master or Login node or shell to a > compute node with a dev environment installed) at a minimum you want the > code to run on the other nodes. Similar generations of processors makes > this pretty likely. Identical makes it simple but probably not worth the > cost on an experiment/learning environment unless you plan to benchmark > results. Setting up queues of nodes that are identical so that a code runs > efficiently on a given subset of nodes is a fair compromise. None of this > matters in the Virtual Machine environment if you decide to start there. > > And everything Doug just said... :) > > On Sun, Mar 3, 2019 at 3:25 AM John Hearns via Beowulf < > beowulf@beowulf.org> wrote: > >> I second OpenHPC. It is actively maintained and easy to set up. >> >> Regarding the hardware, have a look at Doug Eadlines Limulus clusters. I >> think they would be a good fit. >> Dougs site is excellent in general https://www.clustermonkey.net/ >> >> Also some people build Raspberry Pi clusters for learning. >> >> >> On Sun, 3 Mar 2019 at 01:16, Renfro, Michael <ren...@tntech.edu> wrote: >> >>> Heterogeneous is possible, but the slower system will be a bottleneck if >>> you have calculations that require both systems to work in parallel and >>> synchronize with each other periodically. You might also find bottlenecks >>> with your network interconnect, even on homogeneous systems. >>> >>> I’ve never used ROCKS, and OSCAR doesn’t look to have been updated in a >>> few years (maybe it doesn’t need to be). OpenHPC is a similar product, more >>> recently updated. But except for the cluster I manage now, I always just >>> just went with a base operating system for the nodes and added HPC >>> libraries and services as required. >>> >>> > On Mar 2, 2019, at 7:34 AM, Marco Ippolito <ippolito.ma...@gmail.com> >>> wrote: >>> > >>> > Hi all, >>> > >>> > I'm developing an application which need to use tools and other >>> applications that excel in a distributed environment: >>> > - HPX ( https://github.com/STEllAR-GROUP/hpx ) , >>> > - Kafka ( http://kafka.apache.org/ ) >>> > - a blockchain tool. >>> > This is why I'm eager to learn how to deploy a beowulf cluster. >>> > >>> > I've read some info here: >>> > - https://en.wikibooks.org/wiki/Building_a_Beowulf_Cluster >>> > - https://www.linux.com/blog/building-beowulf-cluster-just-13-steps >>> > - >>> https://www-users.cs.york.ac.uk/~mjf/pi_cluster/src/Building_a_simple_Beowulf_cluster.html >>> > >>> > And I have 2 starting questions in order to clarify how I should >>> proceed for a correct cluster building: >>> > >>> > 1) My starting point is a PC, I'm working with at the moment, with >>> this features: >>> > - Corsair Simm Memoria RAM, DDR3, PC1600, 32GB, CL10 Ven k >>> > - Intel Ci7 Box Processore CPU 1150 i7-4790K, 4.00 GHz >>> > - Samsung MZ-76E500B Unità SSD Interna 860 EVO, 500 GB, 2.5" SATA >>> III, Nero/Grigio >>> > - MB ASUS H97-PLUS >>> > - lettore DVD-RW >>> > >>> > I'm using as OS Ubuntu 18.04.01 Server Edition. >>> > >>> > On one side I read that it should be better to put in the same cluster >>> the same type of HW : PCs of the same type, >>> > but on the other side also hetherogeneous HW (server or PCs) can also >>> be deployed. >>> > So....which HW should I take in consideration for the second node, if >>> the features of the very first "node" are the ones above? >>> > >>> > 2) I read that some software (Rocks, OSCAR) would make the cluster >>> configuration easier and smoother. But I also read that >>> > using the same OS, >>> > with the right same version, for all nodes, in my case Ubuntu 18.04.01 >>> Server Edition, could be a safe starter. >>> > So... is it strictly necessary to use Rocks or OSCAR to correctly >>> configure the nodes network? >>> > >>> > Looking forward to your kind hints and suggestions. >>> > Marco >>> > >>> > >>> > _______________________________________________ >>> > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin >>> Computing >>> > To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >>> >>> _______________________________________________ >>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >>> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf