Re: [Beowulf] Most common cluster management software, job schedulers, etc?

Prentice Bisbal Wed, 09 Mar 2016 11:19:12 -0800

On 03/08/2016 11:16 AM, Remy Dernat wrote:

Hi,
Le 08/03/2016 09:25, Carsten Aulbert a écrit :
Hi

On 03/08/2016 05:43 AM, Jeff Friedman wrote:
Hello all. I am just entering the HPC Sales Engineering role, and would
like to focus my learning on the most relevant stuff. I have searched
near and far for a current survey of some sort listing the top used
“stacks”, but cannot seem to find one that is free. I was breaking
things down similar to this:
"relevant" stuff is pretty relative to what you want to achieve ;)
_Provisioning software_: Cobbler, Warewulf, xCAT, Openstack,Platform HPC, ?
Well, OpenStack is designed for cloud, not for HPC, but perhaps somepeople are using OpenStack in that purpose...
You could add RocksCluster, sidus (http://www.cbp.ens-lyon.fr/doku.php?id=en:developpement:productions:sidus), kadeploy ( http://kadeploy3.gforge.inria.fr/ ), perceus (http://moo.nac.uci.edu/~hjm/Perceus-Report.html )...
In case of Debian: FAI
You could also use FAI to serve non-debian-like systems. I use it todeploy ubuntu but you can also deploy redhat-like system, even if itis quite harder. Only the first boot system (through DHCP/PXE andthen, NFS) is debian (nfsroot), then it can install what you need.
_Configuration management_: Warewulf, Puppet, Chef, Ansible, ?
+ SaltStack ?
Generally, people are not using that kind of stuff in HPC, but yes, itcould happen.


Says you! ;)

I used to just do my cluster configuration using a postinstall script inKickstart (as you mention below), but once I started using Puppet for mynon-cluster systems, it made little sense to use two differentconfiguration management methodologies within the enterprise, so Iswitched to just calling 'puppet agent' from the postinstall script.The only difference is that to reduce overhead, I don't keep the puppetagent daemons running on the compute nodes, I used gsh to run 'puppetagent' on-demand. Nowadays, I'd use pdsh instead of gsh.

You have some images to install the whole cluster, and eventually apost configuration step (kickstart ?), if you have a diskfullconfiguration.
Then, you can use cluster-command tool like cluster-ssh, pssh, pdsh...
old school: cfengine, ...
_Resource and job schedulers_: I think these are basically the same
thing? Torque, Lava, Maui, Moab, SLURM, Grid Engine, Son of GridEngine,
Univa, Platform LSF, etc… others?
+ OAR https://oar.imag.fr/
Some Job Schedulers are also Resource Manager, but it is not alwaystrue :https://wiki.hpcc.msu.edu/display/hpccdocs/Resource+Managment+and+Job+Scheduler
for high throughput computing: HTCondor
_Performance monitoring_: Ganglia, Nagios, ?
Icinga, ...
Shinken, zabbix, etc... There are also some new tools with othersstorage and display technology (influxDB , graphite, grafana...)...
But for HPC, ganglia is good enough...

Best,
Remy.

PS : good to know this "small hpc" google site.
Does anyone have any observations as to which of the above are the most
common?  Or is that too broad?  I  believe most the clusters I will be
involved with will be in the 128 - 2000 core range, all on commodity
hardware.
I guess everyone will have their preferences, if you wanted to get to
some hard, recent numbers, one way would be to crate an online
survey/form and ask many people to participate :)

Cheers

Carsten
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visithttp://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Most common cluster management software, job schedulers, etc?

Reply via email to