Re: [Beowulf] Most common cluster management software, job schedulers, etc?

Jeff White Wed, 09 Mar 2016 09:19:28 -0800

I'll throw in my $0.02 since I might be an oddball with how I buildthings...


On 03/07/2016 08:43 PM, Jeff Friedman wrote:

Hello all. I am just entering the HPC Sales Engineering role, andwould like to focus my learning on the most relevant stuff. I havesearched near and far for a current survey of some sort listing thetop used “stacks”, but cannot seem to find one that is free. I wasbreaking things down similar to this:
_OS disto_: CentOS, Debian, TOSS, etc? I know some come trimmeddown, and also include specific HPC libraries, like CNL, CNK, INK?

CentOS 7.  In fact, the base OS for each of my nodes is created with just:

yum groups install "Compute Node" --releasever=7--installroot=/node_roots/sn2


... which is currently in ZFS and exported via NFSv4.

_MPI options_: MPICH2, MVAPICH2, Open MPI, Intel MPI, ?

All of the above (pretty much whatever our users want us to install).

_Provisioning software_: Cobbler, Warewulf, xCAT, Openstack, PlatformHPC, ?

We started with xCAT but moved away for various reasons. Provisioning isdone without this type of management software in my cluster. I have asimple Python script to configure a new node's DHCP, PXE boot file, andNFS export (each node has its own writable root filesystem served to itvia NFS). It's designed to be as simple of an answer to "how can I PXEboot CentOS?" as I could get.


_Configuration management_: Warewulf, Puppet, Chef, Ansible, ?

SaltStack! This is what does the heavy lifting. Nodes boot with a verygeneric CentOS image which only has 1 significant change from stock: aSalt minion is installed. After a node boots, Salt takes over andinstalls software, mounts remote filesystems, cooks dinner, startsdaemons, brings each node into the scheduler, etc. I don't maintain"node images" I maintain Salt states that do all the work after a nodeboots.

_Resource and job schedulers_: I think these are basically the samething? Torque, Lava, Maui, Moab, SLURM, Grid Engine, Son of GridEngine, Univa, Platform LSF, etc… others?

We briefly used Torque+MOAB before running away crying.  We not use SLURM.


_Shared filesystems_: NFS, pNFS, Lustre, GPFS, PVFS2, GlusterFS, ?

NFS (others in the future, we're looking at Ceph at the moment).


_Library management_: Lmod, ?

Lmod.


_Performance monitoring_: Ganglia, Nagios, ?

Ganglia and in the near future, Zabbix.

_Cluster management toolkits_: I believe these perform many of thefunctions above, all wrapped up in one tool? Rocks, Oscar, Scyld,Bright, ?
Does anyone have any observations as to which of the above are themost common? Or is that too broad? I believe most the clusters Iwill be involved with will be in the 128 - 2000 core range, all oncommodity hardware.
Thank you!

- Jeff






_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.beowulf.org_mailman_listinfo_beowulf&d=CwICAg&c=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw&r=DhM5WMgdrH-xWhI5BzkRTzoTvz8C-BRZ05t9kW9SXZk&m=DSX_lPBl-ddcSqZRPHfgBks9Qy7i-jNze66bDl8X10k&s=JbG5Mj7EJIXkC58c2hTufeu_GdjiqqNT7h3ubh0Za38&e=

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Most common cluster management software, job schedulers, etc?

Reply via email to