Hi Everyone Thank you all for the feedback and insights.
So I am starting to see a pattern. Some combination of CentOS + Ansible + OpenHPC + SLURM + Old CUDA/Nvidia Drivers ;-). Sean thank you for those links they will certainly accelerate the journey. (Note to anyone looking you need to remove the “:” at the end of the link else you will get a 404) Finally, yes I am very aware that the hardware is long in the tooth but it is what I have for the time being. Once my capability out strips the capability of the hardware then I am bound to upgrade. At that point I plan to have a manageable cluster that I can add/remove/upgrade at will :-). Thanks again to everyone for the responses and insights. Will let you all know how I go over the coming weeks. Cheers Richard > On 22 Aug 2019, at 1:26 am, Sean McGrath <smcg...@tchpc.tcd.ie> wrote: > > Hi guys, > > I was on the Programme Committee for the HPC Systems Professionals > Workshop, HPCSYSPROS18 at Super Computing last year, > http://sighpc-syspros.org/workshops/2018/index.php.html. > > A couple of the submissions I reviewed may be of interest here. > > (1) Rapid Deployment of Bare-Metal and In-Container HPC Clusters Using > OpenHPC playbooks. > > This was presented. It is essentially a set of ansible playbooks to get > a cluster up and running as quickly as possible > > From their github, https://github.com/XSEDE/CRI_XCBC: > > "This repo will get you to the point of a working slurm installation > across your cluster. It does not currently provide any scientific > software or user management options! > > The basic usage is to set up the master node with the initial 3 roles > (pre_ohpc,ohpc_install,ohpc_config) and use the rest to build node > images, and deploy the actual nodes (these use Warewulf as a > provisioner by default)." > > (2) clusterworks - was not presented at HPCSYSPROS18, it lost out to > the above marginally but is very similar to the first one above. From > their https://github.com/clusterworks/inception: > > "clusterworks is a toolkit that brings together the best modern > technologies in order to create fast and flexible turn-key HPC > environments, deployable on bare-metal infrastructure or in the cloud" > > They may be of some use here. Instead of having to start everything > from scratch you can build on top of those foundations. I don't know > how current those projects are or if they are still being developed > though. > > Sean > > > On Wed, Aug 21, 2019 at 10:27:41AM -0400, Alexander Antoniades wrote: > >> We have been building out a cluster based on commodity servers (mainly >> Gigabyte motherboards) with 8x1080ti/2080ti per server. >> >> We are using a combination of OpenHPC compiled tools and Ansible. I would >> recommend using the OpenHPC software so you don't have to deal with >> figuring out what versions of the tools you need to get and manually >> building them, but I would not go down their prescribed way for building a >> cluster with base images and all for a small heterogeneous cluster. I would >> just build the machines as consistently as they can and then use the >> OpenHPC versions of programs where needed and augment the management with >> something like ansible or even pdsh. >> >> Also unless you're really just doing this an exercise to kill time on >> weekends, or you literally have no money and can get free power/cooling, I >> would really consider looking at what other more modern hardware is >> available, or at least benchmark your system against a sample cloud system >> if you really want to learn GPU computing. >> >> Thanks, >> >> Sander >> >> On Wed, Aug 21, 2019 at 1:56 AM Richard Edwards <e...@fastmail.fm> wrote: >> >>> Hi John >>> >>> No doom and gloom. >>> >>> It's in a purpose built workshop/computer room that I have; 42U Rack, >>> cross draft cooling which is sufficient and 32AMP Power into the PDU???s. >>> The >>> equipment is housed in the 42U Rack along with a variety of other machines >>> such as Sun Enterprise 4000 and a 30 CPU Transputer Cluster. None of it >>> runs 24/7 and not all of it is on at the same time, mainly because of the >>> cost of power :-/ >>> >>> Yeah the Tesla 1070???s scream like a banshee???.. >>> >>> I am planning on running it as power on, on demand setup, which I already >>> do through some HP iLo and APC PDU Scripts that I have for these machines. >>> Until recently I have been running some of them as a vSphere cluster and >>> others as standalone CUDA machines. >>> >>> So that???s one vote for OpenHPC. >>> >>> Cheers >>> >>> Richard >>> >>> On 21 Aug 2019, at 3:45 pm, John Hearns via Beowulf <beowulf@beowulf.org> >>> wrote: >>> >>> Add up the power consumption for each of those servers. If you plan on >>> installing this in a domestic house or indeed in a normal office >>> environment you probably wont have enough amperage in the circuit you >>> intend to power it from. >>> Sorry to be all doom and gloom. >>> Also this setup will make a great deal of noise. If in a domestic setting >>> put it in the garage. >>> In an office setting the obvious place is a comms room but be careful >>> about the ventilation. >>> Office comms rooms often have a single wall mounted air conditioning unit. >>> Make SURE to run a temperature shutdown script. >>> This air con unit WILL fail over a weekend. >>> >>> Regarding the software stack I would look at OpenHPC. But that's just me. >>> >>> >>> >>> >>> >>> On Wed, 21 Aug 2019 at 06:09, Dmitri Chubarov <dmitri.chuba...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> this is a very old hardware and you would have to stay with a very >>>> outdated software stack as 1070 cards are not supported by the recent >>>> versions of NVIDIA Drivers and old versions of NVIDIA drivers do not play >>>> well with modern kernels and modern system libraries.Unless you are doing >>>> this for digital preservation, consider dropping 1070s out of the equation. >>>> >>>> Dmitri >>>> >>>> >>>> On Wed, 21 Aug 2019 at 06:46, Richard Edwards <e...@fastmail.fm> wrote: >>>> >>>>> Hi Folks >>>>> >>>>> So about to build a new personal GPU enabled cluster and am looking for >>>>> peoples thoughts on distribution and management tools. >>>>> >>>>> Hardware that I have available for the build >>>>> - HP Proliant DL380/360 - mix of G5/G6 >>>>> - HP Proliant SL6500 with 8 GPU >>>>> - HP Proliant DL580 - G7 + 2x K20x GPU >>>>> -3x Nvidia Tesla 1070 (4 GPU per unit) >>>>> >>>>> Appreciate people insights/thoughts >>>>> >>>>> Regards >>>>> >>>>> Richard >>>>> _______________________________________________ >>>>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >>>>> To change your subscription (digest mode or unsubscribe) visit >>>>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf >>>>> >>>> _______________________________________________ >>>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >>>> To change your subscription (digest mode or unsubscribe) visit >>>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf >>>> >>> _______________________________________________ >>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >>> To change your subscription (digest mode or unsubscribe) visit >>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf >>> >>> >>> _______________________________________________ >>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >>> To change your subscription (digest mode or unsubscribe) visit >>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf >>> > >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > > > -- > Sean McGrath M.Sc > > Systems Administrator > Trinity Centre for High Performance and Research Computing > Trinity College Dublin > > sean.mcgr...@tchpc.tcd.ie > > https://www.tcd.ie/ > https://www.tchpc.tcd.ie/ > > +353 (0) 1 896 3725 > _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf