Hi, to throw in one more:

Am 17.02.2009 um 20:51 schrieb Chris Dagdigian:

On Feb 17, 2009, at 2:29 PM, Michael Will wrote:

What features differentiate SGE in support of life science workflow
from LSF/PBS/Torque/Condor?

is anyone using IBM's LoadLeveler for Linux and why? I saw that it's available already some time ago, but I got the impression, that it's mostly addressing sites which have already LoadLeveler on AIX, and don't want to introduce a second queuingsystem in their infrastructure.


They all have their pros and cons, heck I'm still an LSF zealot when cost is not an issue as Platform has the best APIs, documentation and layered products for the industry types who need to stand these things up in full production mode within enterprise organizations that may have varying levels of Linux/HPC/MPI experience.

The short list of why Grid Engine became popular in the life sciences:

LSF: great product but commercial-only and a pricing model that can get out of hand (I remember when having more than 4GB RAM in a Linux 1U pushed me into an obscene license tier ...).

Condor: Did not have the fine grained policy and resource allocation tools that make life easier when you need to have a shared cluster resource supporting multiple competing users, groups, projects and workflows. The policy tools for LSF/SGE/PBS were more capable. When I saw condor out in the field seemed to be mostly used only in academic sites and in situations where cycles from PC systems were being aggregated across LAN, metro and wan- scale distances. Bio problems tend to be more I/O or memory bound rather than CPU bound so most bio clusters tend to be closely situated racks of gear.

PBS/TORQUE: I'll ignore the FUD from back in the day when people were claiming that PBS lost jobs and data at high scale and concentrate on just one key differentiator. At the time when life science was transitioning from big SGI Altix and Tru64 Alphaservers machines to commodity compute farms, PBS did not support the concept of array jobs. If there was one overwhelming cluster resource management feature essential for bio work it would be array tasks. This is because we tend to have a very high concentration of batch/serial workflows that involve running an application many many times in a row with varying input files and parameter options. The cliche example in bioinformatics is needing to run half a million blast searches. Without array task scheduling this would require 500,000 individual job submissions. The fact that I never met a serious PBS shop that had not made local custom changes to the source code also soured me on deploying it when I was putting such things into conservative IT shops who were still new and fearful of Linux.

One thing more: AFAIK Torque has no scheduler built-in besides the FIFO one. You will need MAUI (free) or MOAB (commercial) to get a scheduler, with the side effect to have to use "qstat" (for Torque) and "showq" (for MAUI) to investigate the status of the jobs.

-- Reuti


We also don't make heavy use of the globus style WAN-scale capital "G" grid computing as much of our workflows and pipelines are actually performance bound by the speed of storage rather than CPU or memory issues. It was always easier, cheaper and more secure to colocate dedicated CPU resources local to fast storage rather than distribute things out as far as possible.

The big news in Bio-IT these days is actually the terabyte scale wet lab instruments such as confocal microscopes and next-gen DNA sequencing systems that can produce 1-3TB of raw data per experiment. Some of these lab instruments ship with software pipelines developed to run under grid engine. A popular example is the Solexa/Illumina Genome Analyzer which alone has driven SGE uptake in our field. A notable exception is the SOLiD system which (I think) ships with a Windows front end that hides a back end ROCKS cluster running either PBS or torque under the hood.


And from Mark:

how about providing some useful content - for instance, what is it that you think is especially valuable about sge?

Hopefully I've done some of that with this message. It basically boils down to the fact that at the time our field started using compute farms in a serious manner, SGE offered the best overall combination of features, price and fine grained resource allocation & policy control. I think what made us a bit different from some other use cases is our heavy use of serial/batch workflows combined with our tendency to require that our HPC infrastructures support multiple (and potentially competing) workflows and pipelines which made the policy/allocation features a key selection criteria. We also do little if any true WAN-scale "grid" computing due to workflows that tend to be more storage/IO bound than anything else. For people starting fresh with a cluster scheduling layer who did not have an investment in time, expertise and/or software licensing costs, Grid Engine turned out to be a popular choice. With that popularity came a good set of people in the community who can now support and configure these systems (as well as evangelize them) so the cycle is fairly self perpetuating.


General life science cluster cheat sheet:

- Workloads tend to be far more serial/batch in nature than true parallel - Policy and resource allocation features are very important to people deploying these systems - Storage speed is often more important than network speed or latency in many cases - Fast interconnects are often used for cluster/distributed filesystems rather than application message passing - Our MPI codes are often quite horrific from an efficiency/tuning standpoint - gigE works just as well as Myrinet or IB - Exceptions to the MPI rule: computational chemistry, modeling and structure prediction (those fields have well written commercial MPI codes in use) - Huge resistance to improved algorithms as scientists want to use *exactly* the same code that was used to publish the journal paper


-Chris




_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to