On 08/03/16 15:43, Jeff Friedman wrote: > Hello all. I am just entering the HPC Sales Engineering role, and would > like to focus my learning on the most relevant stuff. I have searched > near and far for a current survey of some sort listing the top used > “stacks”, but cannot seem to find one that is free. I was breaking > things down similar to this:
All the following is just for us, but in your role you'll probably need to be familiar with most options I would have thought based on customer requirements. Specialisation for your preferred suite is down to you of course! > _OS disto_: CentOS, Debian, TOSS, etc? I know some come trimmed down, > and also include specific HPC libraries, like CNL, CNK, INK? RHEL - hardware support attitude of "we support both types of Linux, RHEL and SLES". > _MPI options_: MPICH2, MVAPICH2, Open MPI, Intel MPI, ? Open-MPI > _Provisioning software_: Cobbler, Warewulf, xCAT, Openstack, Platform HPC, ? xCAT > _Configuration management_: Warewulf, Puppet, Chef, Ansible, ? xCAT We use Puppet on for infrastructure VMs (running Debian). > _Resource and job schedulers_: I think these are basically the same > thing? Torque, Lava, Maui, Moab, SLURM, Grid Engine, Son of Grid Engine, > Univa, Platform LSF, etc… others? Yes and no - we run Slurm and use its own scheduling mechanisms but you could plug in Moab should you wish. Torque has an example pbs_sched but that's just a FIFO, you'd want to look at Maui or Moab for more sophisticated scheduling. > _Shared filesystems_: NFS, pNFS, Lustre, GPFS, PVFS2, GlusterFS, ? GPFS here - copes well with lots of small files (looks at one OpenFOAM project that has over 19 million files & directories - mostly directories - and sighs). > _Library management_: Lmod, ? I've been using environment modules for almost a decade now but our recent cluster has switched to Lmod. > _Performance monitoring_: Ganglia, Nagios, ? We use Icinga for monitoring infrastructure, including polling xCAT and Slurm for node information such as error LEDs, down nodes, etc. We have pnp4nagios integrated with our Icinga to record time series information about memory usage, etc. > _Cluster management toolkits_: I believe these perform many of the > functions above, all wrapped up in one tool? Rocks, Oscar, Scyld, Bright, ? N/A here. All the best! Chris -- Christopher Samuel Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf