Re: [Beowulf] cluster deployment and config management

2017-09-05 Thread Stu Midgley
I'm not feeling much love for puppet. On Wed, Sep 6, 2017 at 7:51 AM, Christopher Samuel wrote: > On 05/09/17 15:24, Stu Midgley wrote: > > > I am in the process of redeveloping our cluster deployment and config > > management environment and wondered what others are doing? > > xCAT here for all

Re: [Beowulf] cluster deployment and config management

2017-09-05 Thread Christopher Samuel
On 06/09/17 09:51, Christopher Samuel wrote: > Nothing like your scale, of course, but it works and we know if a node > has booted a particular image it will be identical to any other node > that's set to boot the same image. I should mention that we set the osimage for nodes via groups, not dire

Re: [Beowulf] cluster deployment and config management

2017-09-05 Thread Christopher Samuel
On 05/09/17 15:24, Stu Midgley wrote: > I am in the process of redeveloping our cluster deployment and config > management environment and wondered what others are doing? xCAT here for all HPC related infrastructure. Stateful installs for GPFS NSD servers and TSM servers, compute nodes are all s

Re: [Beowulf] cluster deployment and config management

2017-09-05 Thread Douglas Eadline
> Hey everyone, .. any idea what happened with perceus? > http://www.linux-mag.com/id/6386/ > https://github.com/perceus/perceus > > .. yeah; what happened with Arthur Stevens (Perceus, GravityFS/OS Green > Provisioning, etc. ) where he is now; who is maintain, if anyone, perceus > ? I woudl sug

Re: [Beowulf] cluster deployment and config management

2017-09-05 Thread John Hearns via Beowulf
Regarding Rocks clusters, permit me to vent a little. In my last employ we provided Rocks clusters Rocks is firmly embedded in the Redhat 6 era with out of date 2.6 kernels. It uses kickstart for installations (which is OK). However with modern generations of Intel processors you get a warning

Re: [Beowulf] cluster deployment and config management

2017-09-05 Thread psc
Hey everyone, .. any idea what happened with perceus? http://www.linux-mag.com/id/6386/ https://github.com/perceus/perceus .. yeah; what happened with Arthur Stevens (Perceus, GravityFS/OS Green Provisioning, etc. ) where he is now; who is maintain, if anyone, perceus ? .. and come on Greg K.

Re: [Beowulf] cluster deployment and config management

2017-09-05 Thread Tim Cutts
Sanger made a similar separation with our FAI-based Ubuntu deployments. The FAI part of the installation was kept as minimal as possible, with the task purely of partitioning and formatting the hard disk of the machine, determining the appropriate network card configuration, and unpacking a min

Re: [Beowulf] RAID5 rebuild, remount with write without reboot?

2017-09-05 Thread Joe Landman
On 09/05/2017 01:28 PM, mathog wrote: Short form: An 8 disk (all 2Tb SATA) RAID5 on an LSI MR-USAS2 SuperMicro controller (lspci shows " LSI Logic / Symbios Logic MegaRAID SAS 2008 [Falcon]") system was long ago configured with a small partition of one disk as /boot and logical volumes for

Re: [Beowulf] RAID5 rebuild, remount with write without reboot?

2017-09-05 Thread Peter St. John
Aren't the drives in the RAID hot-swappable? Removing the defective drive and installing a new one certainly cycled power on those two? But I'm weak at hardware, and have never knowingly relied on firmware on a disk. On Tue, Sep 5, 2017 at 1:52 PM, Andrew Latham wrote: > Without a power cycle up

Re: [Beowulf] RAID5 rebuild, remount with write without reboot?

2017-09-05 Thread Andrew Latham
Without a power cycle updating the drive firmware would be the only method of tricking the drives into a power-cycle. Obviously very risky. A reboot should be low risk. On Tue, Sep 5, 2017 at 12:28 PM, mathog wrote: > Short form: > > An 8 disk (all 2Tb SATA) RAID5 on an LSI MR-USAS2 SuperMicro c

Re: [Beowulf] RAID5 rebuild, remount with write without reboot?

2017-09-05 Thread John Hearns via Beowulf
David, I have never been in that situation. However I have configured my fair share of LSI controllers so I share your pain! (I reserve my real tears for device mapper RAID). How about a mount -o remount Did you try that before rebooting? I am no expert here - in the past when I have had non-R

[Beowulf] RAID5 rebuild, remount with write without reboot?

2017-09-05 Thread mathog
Short form: An 8 disk (all 2Tb SATA) RAID5 on an LSI MR-USAS2 SuperMicro controller (lspci shows " LSI Logic / Symbios Logic MegaRAID SAS 2008 [Falcon]") system was long ago configured with a small partition of one disk as /boot and logical volumes for / (root) and /home on a single large vir

Re: [Beowulf] cluster deployment and config management

2017-09-05 Thread Arif Ali
>>> Interesting.  Ansible has come up a few times. >>> >>> Our largest cluster is 2000 KNL nodes and we are looking towards 10k... >>> so it needs to scale well :) >>> >> We went with ansible at the end of 2015 until we hit a road block with >> it not using a client daemon a fat ferew months. When

Re: [Beowulf] cluster deployment and config management

2017-09-05 Thread Joe Landman
Good morning ... On 09/05/2017 01:24 AM, Stu Midgley wrote: Morning everyone I am in the process of redeveloping our cluster deployment and config management environment and wondered what others are doing? First, everything we currently have is basically home-grown. Nothing wrong with thi

Re: [Beowulf] cluster deployment and config management

2017-09-05 Thread Rémy Dernat
Hi, Le 05/09/2017 à 08:57, Carsten Aulbert a écrit : Hi On 09/05/17 08:43, Stu Midgley wrote: Interesting. Ansible has come up a few times. Our largest cluster is 2000 KNL nodes and we are looking towards 10k... so it needs to scale well :) We went with ansible at the end of 2015 until we

Re: [Beowulf] cluster deployment and config management

2017-09-05 Thread Lachlan Musicman
Sure am! Re scale, I can't speak to that because we just don't have that size. But Ansible has been bought/absorbed into Redhat now, so the Ansible Tower infrastructure may scale. You would need to test :) cheers L. -- "The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic civi

Re: [Beowulf] cluster deployment and config management

2017-09-05 Thread John Hearns via Beowulf
Fusty? Lachlan - you really are fromt he Western Isles aren't you? Another word: 'oose' - the fluff which collects under the bed. Or inside servers. On 5 September 2017 at 08:57, Carsten Aulbert wrote: > Hi > > On 09/05/17 08:43, Stu Midgley wrote: > > Interesting. Ansible has come up a fe