On Tue, 22 Mar 2016 17:32:40 +0200 (EET) Olli-Pekka Lehto <olli-pekka.le...@csc.fi> wrote:
> Hi, > > I finally got around to writing down my cluster-consistency checklist > that I've been planning for a long time: > > https://github.com/oplehto/cluster-checks/ Looks quite close to what we do. A few additions (randomly floating to the top): * use dshbak / pshbak / dbuck to overview pdsh output (latter two from https://www.nsc.liu.se/~kent/python-hostlist/) * use conrep to read out bios settings from hp servers * dmidecode -t memory can show dimm details We also do most of this automatically in production with our node-health-check suite (will catch bios settings, firmware, cpu and memory performance, ...). /Peter K > The goal is to try to make the baseline installation of a cluster as > consistent as possible and make vendors work for their money. :) Of > course hopefully publishing this will help vendors capture some of > the issues that slip through the cracks even before clusters are > handed over. It's also a good idea to run these types of checks > during the lifetime of the system as there's always some consistency > creep as hardware gets replaced. > > If someone is interested in contributing, pull requests or comments > on the list are welcome. I'm sure that there's something missing as > well. Right now it's just a text-file but making some nicer scripts > and postprocessing for the output might happen as well at some point. > All the examples are very HP oriented as well at this point. > > Best regards, > Olli-Pekka _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf