Hi guys, I was on the Programme Committee for the HPC Systems Professionals Workshop, HPCSYSPROS18 at Super Computing last year, http://sighpc-syspros.org/workshops/2018/index.php.html.
A couple of the submissions I reviewed may be of interest here. (1) Rapid Deployment of Bare-Metal and In-Container HPC Clusters Using OpenHPC playbooks. This was presented. It is essentially a set of ansible playbooks to get a cluster up and running as quickly as possible From their github, https://github.com/XSEDE/CRI_XCBC: "This repo will get you to the point of a working slurm installation across your cluster. It does not currently provide any scientific software or user management options! The basic usage is to set up the master node with the initial 3 roles (pre_ohpc,ohpc_install,ohpc_config) and use the rest to build node images, and deploy the actual nodes (these use Warewulf as a provisioner by default)." (2) clusterworks - was not presented at HPCSYSPROS18, it lost out to the above marginally but is very similar to the first one above. From their https://github.com/clusterworks/inception: "clusterworks is a toolkit that brings together the best modern technologies in order to create fast and flexible turn-key HPC environments, deployable on bare-metal infrastructure or in the cloud" They may be of some use here. Instead of having to start everything from scratch you can build on top of those foundations. I don't know how current those projects are or if they are still being developed though. Sean On Wed, Aug 21, 2019 at 10:27:41AM -0400, Alexander Antoniades wrote: > We have been building out a cluster based on commodity servers (mainly > Gigabyte motherboards) with 8x1080ti/2080ti per server. > > We are using a combination of OpenHPC compiled tools and Ansible. I would > recommend using the OpenHPC software so you don't have to deal with > figuring out what versions of the tools you need to get and manually > building them, but I would not go down their prescribed way for building a > cluster with base images and all for a small heterogeneous cluster. I would > just build the machines as consistently as they can and then use the > OpenHPC versions of programs where needed and augment the management with > something like ansible or even pdsh. > > Also unless you're really just doing this an exercise to kill time on > weekends, or you literally have no money and can get free power/cooling, I > would really consider looking at what other more modern hardware is > available, or at least benchmark your system against a sample cloud system > if you really want to learn GPU computing. > > Thanks, > > Sander > > On Wed, Aug 21, 2019 at 1:56 AM Richard Edwards <e...@fastmail.fm> wrote: > > > Hi John > > > > No doom and gloom. > > > > It's in a purpose built workshop/computer room that I have; 42U Rack, > > cross draft cooling which is sufficient and 32AMP Power into the PDU???s. > > The > > equipment is housed in the 42U Rack along with a variety of other machines > > such as Sun Enterprise 4000 and a 30 CPU Transputer Cluster. None of it > > runs 24/7 and not all of it is on at the same time, mainly because of the > > cost of power :-/ > > > > Yeah the Tesla 1070???s scream like a banshee???.. > > > > I am planning on running it as power on, on demand setup, which I already > > do through some HP iLo and APC PDU Scripts that I have for these machines. > > Until recently I have been running some of them as a vSphere cluster and > > others as standalone CUDA machines. > > > > So that???s one vote for OpenHPC. > > > > Cheers > > > > Richard > > > > On 21 Aug 2019, at 3:45 pm, John Hearns via Beowulf <beowulf@beowulf.org> > > wrote: > > > > Add up the power consumption for each of those servers. If you plan on > > installing this in a domestic house or indeed in a normal office > > environment you probably wont have enough amperage in the circuit you > > intend to power it from. > > Sorry to be all doom and gloom. > > Also this setup will make a great deal of noise. If in a domestic setting > > put it in the garage. > > In an office setting the obvious place is a comms room but be careful > > about the ventilation. > > Office comms rooms often have a single wall mounted air conditioning unit. > > Make SURE to run a temperature shutdown script. > > This air con unit WILL fail over a weekend. > > > > Regarding the software stack I would look at OpenHPC. But that's just me. > > > > > > > > > > > > On Wed, 21 Aug 2019 at 06:09, Dmitri Chubarov <dmitri.chuba...@gmail.com> > > wrote: > > > >> Hi, > >> this is a very old hardware and you would have to stay with a very > >> outdated software stack as 1070 cards are not supported by the recent > >> versions of NVIDIA Drivers and old versions of NVIDIA drivers do not play > >> well with modern kernels and modern system libraries.Unless you are doing > >> this for digital preservation, consider dropping 1070s out of the equation. > >> > >> Dmitri > >> > >> > >> On Wed, 21 Aug 2019 at 06:46, Richard Edwards <e...@fastmail.fm> wrote: > >> > >>> Hi Folks > >>> > >>> So about to build a new personal GPU enabled cluster and am looking for > >>> peoples thoughts on distribution and management tools. > >>> > >>> Hardware that I have available for the build > >>> - HP Proliant DL380/360 - mix of G5/G6 > >>> - HP Proliant SL6500 with 8 GPU > >>> - HP Proliant DL580 - G7 + 2x K20x GPU > >>> -3x Nvidia Tesla 1070 (4 GPU per unit) > >>> > >>> Appreciate people insights/thoughts > >>> > >>> Regards > >>> > >>> Richard > >>> _______________________________________________ > >>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > >>> To change your subscription (digest mode or unsubscribe) visit > >>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > >>> > >> _______________________________________________ > >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > >> To change your subscription (digest mode or unsubscribe) visit > >> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > >> > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > > To change your subscription (digest mode or unsubscribe) visit > > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > > To change your subscription (digest mode or unsubscribe) visit > > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf -- Sean McGrath M.Sc Systems Administrator Trinity Centre for High Performance and Research Computing Trinity College Dublin sean.mcgr...@tchpc.tcd.ie https://www.tcd.ie/ https://www.tchpc.tcd.ie/ +353 (0) 1 896 3725 _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf