Re: [Beowulf] cluster deployment and config management

Christopher Samuel Tue, 05 Sep 2017 16:52:43 -0700

On 05/09/17 15:24, Stu Midgley wrote:

> I am in the process of redeveloping our cluster deployment and config
> management environment and wondered what others are doing?


xCAT here for all HPC related infrastructure.  Stateful installs for
GPFS NSD servers and TSM servers, compute nodes are all statelite, so a
immutable RAMdisk image is built on the management node for the compute
cluster and then on boot they mount various items over NFS (including
the GPFS state directory).

Nothing like your scale, of course, but it works and we know if a node
has booted a particular image it will be identical to any other node
that's set to boot the same image.

Healthcheck scripts mark nodes offline if they don't have the current
production kernel and GPFS versions (and other checks too of course)
plus Slurm's "scontrol reboot" lets us do rolling reboots without
needing to spot when nodes have become idle.

I've got to say I really prefer this to systems like Puppet, Salt, etc,
where you need to go and tweak an image after installation.

For our VM infrastructure (web servers, etc) we do use Salt for that. We
used to use Puppet but we switched when the only person who understood
it left.  Don't miss it at all...

cheers,
Chris
-- 
 Christopher Samuel        Senior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] cluster deployment and config management

Reply via email to