Re: [Beowulf] Please help to setup Beowulf

Chris Dagdigian Wed, 18 Feb 2009 03:51:59 -0800

Hi Mark,

On Feb 18, 2009, at 1:32 AM, Mark Hahn wrote:

searches. Without array task scheduling this would require 500,000individual job submissions. The fact that I never met a serious PBSshop that had not
what's wrong with 500k job submissions? to me, the existence of"array jobs"is an admission that the job/queueing system is inefficient. ifyou're saying that the issue is not per-job overhead of submission,but rather that jobs are too short, well, I think that's a userproblem. I think it's entirely reasonable to require user jobs toconsume some minimum cpu time
(say, few minutes).

Job length can sometimes be an issue but training users to make suretheir jobs at least take a few minutes to complete is pretty easy.It's really an issue of having really large batch or serial workflowsto get through. These are people who are using a cluster not becausethey are computer scientists or people interested in parallel codingmethods - they are scientists trying to get a ton of work done in areasonable amount of time and with minimal effort.

500K job submissions would put a non-trivial load on just about anyscheduler, especially a few years ago. The act of actually submittingthe 500K jobs can be a pain from a usage perspective. Not to mentionthat the user/system now has 500K individual jobIDs to track. With anarray task I get a single jobID that I can use to track status of allthe sub tasks and I can kill the job with a single bkill/qdel command.It's also a single bsub/qsub submission command to get the ball rolling.

From a user, usability and scheduler efficiency perspective, arrayjobs are a massive win for large sequential workflows, especiallythose that consist of running the same application over and over againwith only minor differences in command line arguments or input files.

Array tasks may be distasteful from a technical or eleganceperspective but they are a big usability and throughput win in thereal world, especially for end users interested in productivity.

- Policy and resource allocation features are very important topeople deploying these systems
so I'm curious what that means. things like "dept A needs to beguaranteedN cpus, but dept B gets to use whatever is left over"? or nodechoice based on amount of free disk? I don't really see why thesesorts of issues
would be less important to more parallel environments.

Resource allocation policies and the tools to implement such areextremely important and are often a significant part of the selectioncriteria when trying to figure out what distributed resource managerto use. Way more important than anything involving parallelenvironments simply due to the fact that there are relatively few MPI-aware applications in our field.

FIFO scheduling or rewarding the dude who got to work earliest andsubmitted 500K jobs first is not the answer. People needed to be ableto let scientific or business priorities drive and influence howcluster resources are allocated among competing users, projects anddepartments. For some people it may be as simple as carving up thecluster on a percentage basis among 4 departments and for others thekey criteria may be the ease of integration with an external flexLMlicense server.

The majority may just want simple fairshare-by-user schedulingbehavior without having to drop in some external metascheduler orthird party product.

The quality and capability of the knobs for adjusting these sorts ofbehavior is important in commercial environments and in places wherethe cluster has been sold as a shared resource for groups that mayhave competing needs for resources.

Platform LSF is excellent at this sort of thing and among the freelyavailable offerings Grid Engine had good flexibility and capabilityout of the box without requiring additional plugin products. Justanother reason why there was SGE uptake in our field over the years.Now, since SGE 6.1 with the addition of the resource quota frameworkSGE is quite powerful in this regard.

- Storage speed is often more important than network speed orlatency in many cases
which makes me wonder: do bio types consider using map-reduce-like
frameworks?  that is, basically distributing the work to the data.

map-reduce gets added to the same bin as hardware based FPGAacceleration, GPU computing and other newish techniques. Modernalgorithms and new efforts by people with real scientific softwaredevelopment and HPC skills are all looking at these techniques andyou'll see slow uptake over time.

Real progress is being made, see Joe's efforts regarding HMMER runningon GPUs these days etc.

This does not quite address the older legacy codes though. You have toremember that our core applications were written in the early 90s bybiologists who had to teach themselves to code simply to get theirscience done. Few if any people had real skills in HPC softwaredevelopment or high efficiency coding. These are the people (likemyself) who started using Perl on large memory 64bit systems simplybecause perl was loose enough to let us do dumb things like read afull genome into a string and run regex operations on it.

If you approached a biologist and said "I re-wrote your blastapplication to use map-reduce!", most would turn around and ask youfor the citation of your peer reviewed paper where you published andproved that your map-reduce version produces identical results andoutput (including reproducing known bugs) to the old inefficient codethat it was meant to replace.

There is a huge resistance to improved/updated codes simply due to thefact that the scientists want to use the exact method cited in thepaper that they are trying to reproduce. It's been a hassle to dealwith but the block is real - just ask all of the FPGA hardwareacceleration box makers out there (those that still exist).



-Chris








_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Please help to setup Beowulf

Reply via email to