Thanks everyone! Your replies were very helpful. > > >> On Mar 8, 2016, at 2:49 PM, Christopher Samuel <sam...@unimelb.edu.au> wrote: >> >> On 08/03/16 15:43, Jeff Friedman wrote: >> >>> Hello all. I am just entering the HPC Sales Engineering role, and would >>> like to focus my learning on the most relevant stuff. I have searched >>> near and far for a current survey of some sort listing the top used >>> “stacks”, but cannot seem to find one that is free. I was breaking >>> things down similar to this: >> >> All the following is just for us, but in your role you'll probably need >> to be familiar with most options I would have thought based on customer >> requirements. Specialisation for your preferred suite is down to you of >> course! >> >>> _OS disto_: CentOS, Debian, TOSS, etc? I know some come trimmed down, >>> and also include specific HPC libraries, like CNL, CNK, INK? >> >> RHEL - hardware support attitude of "we support both types of Linux, >> RHEL and SLES". >> >>> _MPI options_: MPICH2, MVAPICH2, Open MPI, Intel MPI, ? >> >> Open-MPI >> >>> _Provisioning software_: Cobbler, Warewulf, xCAT, Openstack, Platform HPC, ? >> >> xCAT >> >>> _Configuration management_: Warewulf, Puppet, Chef, Ansible, ? >> >> xCAT >> >> We use Puppet on for infrastructure VMs (running Debian). >> >>> _Resource and job schedulers_: I think these are basically the same >>> thing? Torque, Lava, Maui, Moab, SLURM, Grid Engine, Son of Grid Engine, >>> Univa, Platform LSF, etc… others? >> >> Yes and no - we run Slurm and use its own scheduling mechanisms but you >> could plug in Moab should you wish. >> >> Torque has an example pbs_sched but that's just a FIFO, you'd want to >> look at Maui or Moab for more sophisticated scheduling. >> >>> _Shared filesystems_: NFS, pNFS, Lustre, GPFS, PVFS2, GlusterFS, ? >> >> GPFS here - copes well with lots of small files (looks at one OpenFOAM >> project that has over 19 million files & directories - mostly >> directories - and sighs). >> >>> _Library management_: Lmod, ? >> >> I've been using environment modules for almost a decade now but our >> recent cluster has switched to Lmod. >> >>> _Performance monitoring_: Ganglia, Nagios, ? >> >> We use Icinga for monitoring infrastructure, including polling xCAT and >> Slurm for node information such as error LEDs, down nodes, etc. >> >> We have pnp4nagios integrated with our Icinga to record time series >> information about memory usage, etc. >> >>> _Cluster management toolkits_: I believe these perform many of the >>> functions above, all wrapped up in one tool? Rocks, Oscar, Scyld, Bright, ? >> >> N/A here. >> >> All the best! >> Chris >> -- >> Christopher Samuel Senior Systems Administrator >> VLSCI - Victorian Life Sciences Computation Initiative >> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 >> http://www.vlsci.org.au/ http://twitter.com/vlsci >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf