Hi, to throw in one more:
Am 17.02.2009 um 20:51 schrieb Chris Dagdigian:
On Feb 17, 2009, at 2:29 PM, Michael Will wrote:
What features differentiate SGE in support of life science workflow
from LSF/PBS/Torque/Condor?
is anyone using IBM's LoadLeveler for Linux and why? I saw that it's
available already some time ago, but I got the impression, that it's
mostly addressing sites which have already LoadLeveler on AIX, and
don't want to introduce a second queuingsystem in their infrastructure.
They all have their pros and cons, heck I'm still an LSF zealot
when cost is not an issue as Platform has the best APIs,
documentation and layered products for the industry types who need
to stand these things up in full production mode within enterprise
organizations that may have varying levels of Linux/HPC/MPI
experience.
The short list of why Grid Engine became popular in the life sciences:
LSF: great product but commercial-only and a pricing model that can
get out of hand (I remember when having more than 4GB RAM in a
Linux 1U pushed me into an obscene license tier ...).
Condor: Did not have the fine grained policy and resource
allocation tools that make life easier when you need to have a
shared cluster resource supporting multiple competing users,
groups, projects and workflows. The policy tools for LSF/SGE/PBS
were more capable. When I saw condor out in the field seemed to be
mostly used only in academic sites and in situations where cycles
from PC systems were being aggregated across LAN, metro and wan-
scale distances. Bio problems tend to be more I/O or memory bound
rather than CPU bound so most bio clusters tend to be closely
situated racks of gear.
PBS/TORQUE: I'll ignore the FUD from back in the day when people
were claiming that PBS lost jobs and data at high scale and
concentrate on just one key differentiator. At the time when life
science was transitioning from big SGI Altix and Tru64 Alphaservers
machines to commodity compute farms, PBS did not support the
concept of array jobs. If there was one overwhelming cluster
resource management feature essential for bio work
it would be array tasks. This is because we tend to have a very
high concentration of batch/serial workflows that involve running
an application many many times in a row with varying input files
and parameter options. The cliche example in bioinformatics is
needing to run half a million blast searches. Without array task
scheduling this would require 500,000 individual job submissions.
The fact that I never met a serious PBS shop that had not made
local custom changes to the source code also soured me on deploying
it when I was putting such things into conservative IT shops who
were still new and fearful of Linux.
One thing more: AFAIK Torque has no scheduler built-in besides the
FIFO one. You will need MAUI (free) or MOAB (commercial) to get a
scheduler, with the side effect to have to use "qstat" (for Torque)
and "showq" (for MAUI) to investigate the status of the jobs.
-- Reuti
We also don't make heavy use of the globus style WAN-scale capital
"G" grid computing as much of our workflows and pipelines are
actually performance bound by the speed of storage rather than CPU
or memory issues. It was always easier, cheaper and more secure to
colocate dedicated CPU resources local to fast storage rather than
distribute things out as far as possible.
The big news in Bio-IT these days is actually the terabyte scale
wet lab instruments such as confocal microscopes and next-gen DNA
sequencing systems that can produce 1-3TB of raw data per
experiment. Some of these lab instruments ship with software
pipelines developed to run under grid engine. A popular example is
the Solexa/Illumina Genome Analyzer which alone has driven SGE
uptake in our field. A notable exception is the SOLiD system which
(I think) ships with a Windows front end that hides a back end
ROCKS cluster running either PBS or torque under the hood.
And from Mark:
how about providing some useful content - for instance, what is it
that you think is especially valuable about sge?
Hopefully I've done some of that with this message. It basically
boils down to the fact that at the time our field started using
compute farms in a serious manner, SGE offered the best overall
combination of features, price and fine grained resource allocation
& policy control. I think what made us a bit different from some
other use cases is our heavy use of serial/batch workflows combined
with our tendency to require that our HPC infrastructures support
multiple (and potentially competing) workflows and pipelines which
made the policy/allocation features a key selection criteria. We
also do little if any true WAN-scale "grid" computing due to
workflows that tend to be more storage/IO bound than anything
else. For people starting fresh with a cluster scheduling layer
who did not have an investment in time, expertise and/or software
licensing costs, Grid Engine turned out to be a popular choice.
With that popularity came a good set of people in the community who
can now support and configure these systems (as well as evangelize
them) so the cycle is fairly self perpetuating.
General life science cluster cheat sheet:
- Workloads tend to be far more serial/batch in nature than true
parallel
- Policy and resource allocation features are very important to
people deploying these systems
- Storage speed is often more important than network speed or
latency in many cases
- Fast interconnects are often used for cluster/distributed
filesystems rather than application message passing
- Our MPI codes are often quite horrific from an efficiency/tuning
standpoint - gigE works just as well as Myrinet or IB
- Exceptions to the MPI rule: computational chemistry, modeling and
structure prediction (those fields have well written commercial MPI
codes in use)
- Huge resistance to improved algorithms as scientists want to use
*exactly* the same code that was used to publish the journal paper
-Chris
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf