Re: [Beowulf] Bright Cluster Manager

Chris Dagdigian Wed, 02 May 2018 13:21:12 -0700

Jeff White wrote:

I never used Bright. Touched it and talked to a salesperson at aconference but I wasn't impressed.
Unpopular opinion: I don't see a point in using "cluster managers"unless you have a very tiny cluster and zero Linux experience. Theseare just Linux boxes with a couple applications (e.g. Slurm) runningon them. Nothing special. xcat/Warewulf/Scyld/Rocks just get in theway more than they help IMO. They are mostly crappy wrappers aroundfree software (e.g. ISC's dhcpd) anyway. When they aren't it'sproprietary trash.
I install CentOS nodes and useSalt/Chef/Puppet/Ansible/WhoCares/Whatever to plop down my configs andsoftware. This also means I'm not suck with "node images" and caninstead build everything as plain old text files (read: writeSaltStack states), update them at will, and push changes any time. My"base image" is CentOS and I need no "baby's first cluster" HPCsoftware to install/PXEboot it. YMMV

Totally legit opinion and probably not unpopular at all given the usermix on this list!

The issue here is assuming a level of domain expertise with Linux,bare-metal provisioning, DevOps and (most importantly) HPC-specificconfigStuff that may be pervasive or easily available in yourenvironment but is often not easily available in acommercial/industrial environment where HPC or "scientific computing"is just another business area that a large central IT organization mustsupport.

If you have that level of expertise available then the self-managed DIYmethod is best. It's also my preference

But in the commercial world where HPC is becoming more and moreimportant you run into stuff like:

- Central IT may not actually have anyone on staff who knows Linux (morecommon than you expect; I see this in Pharma/Biotech all the time)

- The HPC user base is not given budget or resource to self-supporttheir own stack because of a drive to centralize IT ops and support

- And if they do have Linux people on staff they may be novice-levelpeople or have zero experience with HPC schedulers, MPI fabric tweakingand app needs (the domain stuff)

- And if miracles occur and they do have expert level linux people thenmore often than not these people are overworked or stretched in manydirections

So what happens in these environments is that organizations willwillingly (and happily) pay commercial pricing and adopt closed-sourceproducts if they can deliver a measurable reduction in administrativeburden, operational effort or support burden.

This is where Bright, Univa etc. all come in -- you can buy stuff fromthem that dramatically reduces that onsite/local IT has to manage thecare and feeding of.

Just having a vendor to call for support on Grid Engine oddities makesthe cost of Univa licensing worthwhile. Just having a vendor like Brightbe on the hook for "cluster operations" is a huge win for an overworkedIT staff that does not have linux or HPC specialists on-staff or easilyavailable.

My best example of "paying to reduce operational burden in HPC" comesfrom a massive well known genome shop in the cambridge, MA area. Theyoften tell this story:


- 300 TB of new data generation per week (many years ago)

- One of the initial storage tiers was ZFS running on commodity serverhardware- Keeping the DIY ZFS appliances online and running took the FULL TIMEefforts of FIVE STORAGE ENGINEERS

They realized that staff support was not scalable with DIY/ZFS at300TB/week of new data generation so they went out and bought a giantEMC Isilon scale-out NAS platform

And you know what? After the Isilon NAS was deployed the management of*many* petabytes of single-namespace storage was now handled by the ITDirector in his 'spare time' -- And the five engineers who used to donothing but keep ZFS from falling over were re-assigned to moreimpactful and presumably more fun/interesting work.

They actually went on stage at several conferences and told the story ofhow Isilon allowed senior IT leadership to manage petabyte volumes ofdata "in their spare time" -- this was a huge deal and really resonated. Really reinforced for me how in some cases it's actually a good ideato pay $$$ for commercial stuff if it delivers gains inops/support/management.

Sorry to digress! This is a topic near and dear to me. I often have todo HPC work in commercial environments where the skills simply don'texist onsite. Or more commonly -- they have budget to buy software orhardware but they are under a hiring freeze and are not allowed to bringin new Humans.

Quite a bit of my work on projects like this is helping people makesober decisions regarding "build" or "buy" -- and in those environmentsit's totally clear that for some things it makes sense for them to payfor an expensive commercially supported "thing" that they don't have tomanage or support themselves



My $.02 ...






_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Bright Cluster Manager

Reply via email to