Jeff White wrote:

I never used Bright. Touched it and talked to a salesperson at a conference but I wasn't impressed.

Unpopular opinion: I don't see a point in using "cluster managers" unless you have a very tiny cluster and zero Linux experience. These are just Linux boxes with a couple applications (e.g. Slurm) running on them. Nothing special. xcat/Warewulf/Scyld/Rocks just get in the way more than they help IMO. They are mostly crappy wrappers around free software (e.g. ISC's dhcpd) anyway. When they aren't it's proprietary trash.

I install CentOS nodes and use Salt/Chef/Puppet/Ansible/WhoCares/Whatever to plop down my configs and software. This also means I'm not suck with "node images" and can instead build everything as plain old text files (read: write SaltStack states), update them at will, and push changes any time. My "base image" is CentOS and I need no "baby's first cluster" HPC software to install/PXEboot it. YMMV



Totally legit opinion and probably not unpopular at all given the user mix on this list!

The issue here is assuming a level of domain expertise with Linux, bare-metal provisioning, DevOps and (most importantly) HPC-specific configStuff that may be pervasive or easily available in your environment but is often not easily available in a commercial/industrial environment where HPC or "scientific computing" is just another business area that a large central IT organization must support.

If you have that level of expertise available then the self-managed DIY method is best. It's also my preference

But in the commercial world where HPC is becoming more and more important you run into stuff like:

- Central IT may not actually have anyone on staff who knows Linux (more common than you expect; I see this in Pharma/Biotech all the time)

- The HPC user base is not given budget or resource to self-support their own stack because of a drive to centralize IT ops and support

- And if they do have Linux people on staff they may be novice-level people or have zero experience with HPC schedulers, MPI fabric tweaking and app needs (the domain stuff)

- And if miracles occur and they do have expert level linux people then more often than not these people are overworked or stretched in many directions


So what happens in these environments is that organizations will willingly (and happily) pay commercial pricing and adopt closed-source products if they can deliver a measurable reduction in administrative burden, operational effort or support burden.

This is where Bright, Univa etc. all come in -- you can buy stuff from them that dramatically reduces that onsite/local IT has to manage the care and feeding of.

Just having a vendor to call for support on Grid Engine oddities makes the cost of Univa licensing worthwhile. Just having a vendor like Bright be on the hook for "cluster operations" is a huge win for an overworked IT staff that does not have linux or HPC specialists on-staff or easily available.

My best example of "paying to reduce operational burden in HPC" comes from a massive well known genome shop in the cambridge, MA area. They often tell this story:

- 300 TB of new data generation per week (many years ago)
- One of the initial storage tiers was ZFS running on commodity server hardware - Keeping the DIY ZFS appliances online and running took the FULL TIME efforts of FIVE STORAGE ENGINEERS

They realized that staff support was not scalable with DIY/ZFS at 300TB/week of new data generation so they went out and bought a giant EMC Isilon scale-out NAS platform

And you know what? After the Isilon NAS was deployed the management of *many* petabytes of single-namespace storage was now handled by the IT Director in his 'spare time' -- And the five engineers who used to do nothing but keep ZFS from falling over were re-assigned to more impactful and presumably more fun/interesting work.


They actually went on stage at several conferences and told the story of how Isilon allowed senior IT leadership to manage petabyte volumes of data "in their spare time" -- this was a huge deal and really resonated . Really reinforced for me how in some cases it's actually a good idea to pay $$$ for commercial stuff if it delivers gains in ops/support/management.


Sorry to digress! This is a topic near and dear to me. I often have to do HPC work in commercial environments where the skills simply don't exist onsite. Or more commonly -- they have budget to buy software or hardware but they are under a hiring freeze and are not allowed to bring in new Humans.

Quite a bit of my work on projects like this is helping people make sober decisions regarding "build" or "buy" -- and in those environments it's totally clear that for some things it makes sense for them to pay for an expensive commercially supported "thing" that they don't have to manage or support themselves


My $.02 ...






_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to