I'll throw in my $0.02 since I might be an oddball with how I build things...

On 03/07/2016 08:43 PM, Jeff Friedman wrote:
Hello all. I am just entering the HPC Sales Engineering role, and would like to focus my learning on the most relevant stuff. I have searched near and far for a current survey of some sort listing the top used “stacks”, but cannot seem to find one that is free. I was breaking things down similar to this:

_OS disto_: CentOS, Debian, TOSS, etc? I know some come trimmed down, and also include specific HPC libraries, like CNL, CNK, INK?
CentOS 7.  In fact, the base OS for each of my nodes is created with just:

yum groups install "Compute Node" --releasever=7 --installroot=/node_roots/sn2

... which is currently in ZFS and exported via NFSv4.
_MPI options_: MPICH2, MVAPICH2, Open MPI, Intel MPI, ?
All of the above (pretty much whatever our users want us to install).
_Provisioning software_: Cobbler, Warewulf, xCAT, Openstack, Platform HPC, ?
We started with xCAT but moved away for various reasons. Provisioning is done without this type of management software in my cluster. I have a simple Python script to configure a new node's DHCP, PXE boot file, and NFS export (each node has its own writable root filesystem served to it via NFS). It's designed to be as simple of an answer to "how can I PXE boot CentOS?" as I could get.

_Configuration management_: Warewulf, Puppet, Chef, Ansible, ?
SaltStack! This is what does the heavy lifting. Nodes boot with a very generic CentOS image which only has 1 significant change from stock: a Salt minion is installed. After a node boots, Salt takes over and installs software, mounts remote filesystems, cooks dinner, starts daemons, brings each node into the scheduler, etc. I don't maintain "node images" I maintain Salt states that do all the work after a node boots.

_Resource and job schedulers_: I think these are basically the same thing? Torque, Lava, Maui, Moab, SLURM, Grid Engine, Son of Grid Engine, Univa, Platform LSF, etc… others?
We briefly used Torque+MOAB before running away crying.  We not use SLURM.

_Shared filesystems_: NFS, pNFS, Lustre, GPFS, PVFS2, GlusterFS, ?
NFS (others in the future, we're looking at Ceph at the moment).

_Library management_: Lmod, ?
Lmod.

_Performance monitoring_: Ganglia, Nagios, ?
Ganglia and in the near future, Zabbix.

_Cluster management toolkits_: I believe these perform many of the functions above, all wrapped up in one tool? Rocks, Oscar, Scyld, Bright, ?

Does anyone have any observations as to which of the above are the most common? Or is that too broad? I believe most the clusters I will be involved with will be in the 128 - 2000 core range, all on commodity hardware.

Thank you!

- Jeff






_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.beowulf.org_mailman_listinfo_beowulf&d=CwICAg&c=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw&r=DhM5WMgdrH-xWhI5BzkRTzoTvz8C-BRZ05t9kW9SXZk&m=DSX_lPBl-ddcSqZRPHfgBks9Qy7i-jNze66bDl8X10k&s=JbG5Mj7EJIXkC58c2hTufeu_GdjiqqNT7h3ubh0Za38&e=

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to