Re: [Beowulf] Please help to setup Beowulf

Reuti Tue, 17 Feb 2009 12:57:29 -0800

Hi, to throw in one more:

Am 17.02.2009 um 20:51 schrieb Chris Dagdigian:

On Feb 17, 2009, at 2:29 PM, Michael Will wrote:

What features differentiate SGE in support of life science workflow
from LSF/PBS/Torque/Condor?

is anyone using IBM's LoadLeveler for Linux and why? I saw that it'savailable already some time ago, but I got the impression, that it'smostly addressing sites which have already LoadLeveler on AIX, anddon't want to introduce a second queuingsystem in their infrastructure.

They all have their pros and cons, heck I'm still an LSF zealotwhen cost is not an issue as Platform has the best APIs,documentation and layered products for the industry types who needto stand these things up in full production mode within enterpriseorganizations that may have varying levels of Linux/HPC/MPIexperience.
The short list of why Grid Engine became popular in the life sciences:
LSF: great product but commercial-only and a pricing model that canget out of hand (I remember when having more than 4GB RAM in aLinux 1U pushed me into an obscene license tier ...).
Condor: Did not have the fine grained policy and resourceallocation tools that make life easier when you need to have ashared cluster resource supporting multiple competing users,groups, projects and workflows. The policy tools for LSF/SGE/PBSwere more capable. When I saw condor out in the field seemed to bemostly used only in academic sites and in situations where cyclesfrom PC systems were being aggregated across LAN, metro and wan-scale distances. Bio problems tend to be more I/O or memory boundrather than CPU bound so most bio clusters tend to be closelysituated racks of gear.
PBS/TORQUE: I'll ignore the FUD from back in the day when peoplewere claiming that PBS lost jobs and data at high scale andconcentrate on just one key differentiator. At the time when lifescience was transitioning from big SGI Altix and Tru64 Alphaserversmachines to commodity compute farms, PBS did not support theconcept of array jobs. If there was one overwhelming clusterresource management feature essential for bio workit would be array tasks. This is because we tend to have a veryhigh concentration of batch/serial workflows that involve runningan application many many times in a row with varying input filesand parameter options. The cliche example in bioinformatics isneeding to run half a million blast searches. Without array taskscheduling this would require 500,000 individual job submissions.The fact that I never met a serious PBS shop that had not madelocal custom changes to the source code also soured me on deployingit when I was putting such things into conservative IT shops whowere still new and fearful of Linux.

One thing more: AFAIK Torque has no scheduler built-in besides theFIFO one. You will need MAUI (free) or MOAB (commercial) to get ascheduler, with the side effect to have to use "qstat" (for Torque)and "showq" (for MAUI) to investigate the status of the jobs.


-- Reuti

We also don't make heavy use of the globus style WAN-scale capital"G" grid computing as much of our workflows and pipelines areactually performance bound by the speed of storage rather than CPUor memory issues. It was always easier, cheaper and more secure tocolocate dedicated CPU resources local to fast storage rather thandistribute things out as far as possible.
The big news in Bio-IT these days is actually the terabyte scalewet lab instruments such as confocal microscopes and next-gen DNAsequencing systems that can produce 1-3TB of raw data perexperiment. Some of these lab instruments ship with softwarepipelines developed to run under grid engine. A popular example isthe Solexa/Illumina Genome Analyzer which alone has driven SGEuptake in our field. A notable exception is the SOLiD system which(I think) ships with a Windows front end that hides a back endROCKS cluster running either PBS or torque under the hood.
And from Mark:
how about providing some useful content - for instance, what is itthat you think is especially valuable about sge?
Hopefully I've done some of that with this message. It basicallyboils down to the fact that at the time our field started usingcompute farms in a serious manner, SGE offered the best overallcombination of features, price and fine grained resource allocation& policy control. I think what made us a bit different from someother use cases is our heavy use of serial/batch workflows combinedwith our tendency to require that our HPC infrastructures supportmultiple (and potentially competing) workflows and pipelines whichmade the policy/allocation features a key selection criteria. Wealso do little if any true WAN-scale "grid" computing due toworkflows that tend to be more storage/IO bound than anythingelse. For people starting fresh with a cluster scheduling layerwho did not have an investment in time, expertise and/or softwarelicensing costs, Grid Engine turned out to be a popular choice.With that popularity came a good set of people in the community whocan now support and configure these systems (as well as evangelizethem) so the cycle is fairly self perpetuating.
General life science cluster cheat sheet:
- Workloads tend to be far more serial/batch in nature than trueparallel- Policy and resource allocation features are very important topeople deploying these systems- Storage speed is often more important than network speed orlatency in many cases- Fast interconnects are often used for cluster/distributedfilesystems rather than application message passing- Our MPI codes are often quite horrific from an efficiency/tuningstandpoint - gigE works just as well as Myrinet or IB- Exceptions to the MPI rule: computational chemistry, modeling andstructure prediction (those fields have well written commercial MPIcodes in use)- Huge resistance to improved algorithms as scientists want to use*exactly* the same code that was used to publish the journal paper
-Chris




_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visithttp://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Please help to setup Beowulf

Reply via email to