Re: [Beowulf] scheduler policy design

Tim Cutts Tue, 24 Apr 2007 07:07:26 -0700


On 24 Apr 2007, at 1:30 pm, Toon Knapen wrote:

Tim Cutts wrote:
but what if you have a bi-cpu bi-core machine to which you assign4 slots. Now one slot is being used by a process which performsheavy IO. Suppose another process is launched that performs heavyIO. In that case the latter process should wait until the firstone is done to avoid slowing down the efficiency of the system.Generally however, clusters take only time and memoryrequirements into account.
I think that varies. LSF records the current I/O of a node as oneof its load indices, so you can request a node which is doing lessthan a certain amount of I/O. I imagine the same is true of SGE,but I wouldn't know.
Indeed, using SGE you could also take this into account. However ifsomeone submits 4 jobs, the jobs do not directly start to generateheavy I/O. So the scheduler might think that the 4 jobs can easilycoexist on this same node. However, after a few minutes all 4 jobsstart eating disk BW and slow the node down horribly. What wouldyour suggestion be to solve this ?

With LSF, you use resource reservation, using an rusage[] statement.Let's say, for example, that you want to keep IO on the node below 15MB/sec (just for argument's sake) and you know that your codeperforms I/O at 5 MB/sec. Let's also assume that the node can only15 MB/sec total (which is pathetic, I know, but serves to illustratethe example). This means you know that you only want to start a jobif the current I/O load is less than 10 MB/sec. So, you tell LSF thefollowing:


bsub -R"select[io <= 10000] rusage[io=5000]" ...

So, to show what LSF does in this case, on a single machine with fourprocessors:

This machine, given the above other conditions, would becomeoverloaded if LSF started four jobs on it, but it can cope withthree. This is what happens:


Initial state:  0 jobs running, io load is 0.  reserved io is 0.

load+reserved is <= 10000, so LSF starts a job.

State:  1 job running, io load is 0, reserved io is 5000

load+reserved still <= 10000, so LSF starts another job

State:  2 jobs running, io load is 0, reserved io is 10000

load+reserved is still <= 10000, so LSF starts another job

State:  3 jobs running, io load is 0, reserved io is 15000

load+reserved is now >10000, so LSF will not start the fourth job,even though a processor is available, and the three currently runningjobs haven't started performing their massive I/O yet.


This scheme works quite well, but has some caveats:

1) It is still vulnerable to someone submitting an I/O intensive jobwithout appropriate resource requirements (but that's back to myoriginal point; if you don't give the scheduler the rightinformation, it can't possibly schedule optimally). You can alwaysimplement an esub rule to force people to add the appropriateresources (I do precisely that for memory intensive jobs, usingexactly this technique).

2) The syntax Platform use only works well for jobs which use aresource throughout their life, or for a limited period at thebeginning. For cases where it only does something for a limitedperiod at the end, you *have* to reserve the resource for the entirelifetime of the job. This isn't optimal, but without a time machineit's hard to do it any other way.


Tim.

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] scheduler policy design

Reply via email to