On 24 Apr 2007, at 1:30 pm, Toon Knapen wrote:

Tim Cutts wrote:

but what if you have a bi-cpu bi-core machine to which you assign 4 slots. Now one slot is being used by a process which performs heavy IO. Suppose another process is launched that performs heavy IO. In that case the latter process should wait until the first one is done to avoid slowing down the efficiency of the system. Generally however, clusters take only time and memory requirements into account.
I think that varies. LSF records the current I/O of a node as one of its load indices, so you can request a node which is doing less than a certain amount of I/O. I imagine the same is true of SGE, but I wouldn't know.


Indeed, using SGE you could also take this into account. However if someone submits 4 jobs, the jobs do not directly start to generate heavy I/O. So the scheduler might think that the 4 jobs can easily coexist on this same node. However, after a few minutes all 4 jobs start eating disk BW and slow the node down horribly. What would your suggestion be to solve this ?

With LSF, you use resource reservation, using an rusage[] statement. Let's say, for example, that you want to keep IO on the node below 15 MB/sec (just for argument's sake) and you know that your code performs I/O at 5 MB/sec. Let's also assume that the node can only 15 MB/sec total (which is pathetic, I know, but serves to illustrate the example). This means you know that you only want to start a job if the current I/O load is less than 10 MB/sec. So, you tell LSF the following:

bsub -R"select[io <= 10000] rusage[io=5000]" ...

So, to show what LSF does in this case, on a single machine with four processors:

This machine, given the above other conditions, would become overloaded if LSF started four jobs on it, but it can cope with three. This is what happens:

Initial state:  0 jobs running, io load is 0.  reserved io is 0.

load+reserved is <= 10000, so LSF starts a job.

State:  1 job running, io load is 0, reserved io is 5000

load+reserved still <= 10000, so LSF starts another job

State:  2 jobs running, io load is 0, reserved io is 10000

load+reserved is still <= 10000, so LSF starts another job

State:  3 jobs running, io load is 0, reserved io is 15000

load+reserved is now >10000, so LSF will not start the fourth job, even though a processor is available, and the three currently running jobs haven't started performing their massive I/O yet.

This scheme works quite well, but has some caveats:

1) It is still vulnerable to someone submitting an I/O intensive job without appropriate resource requirements (but that's back to my original point; if you don't give the scheduler the right information, it can't possibly schedule optimally). You can always implement an esub rule to force people to add the appropriate resources (I do precisely that for memory intensive jobs, using exactly this technique).

2) The syntax Platform use only works well for jobs which use a resource throughout their life, or for a limited period at the beginning. For cases where it only does something for a limited period at the end, you *have* to reserve the resource for the entire lifetime of the job. This isn't optimal, but without a time machine it's hard to do it any other way.

Tim.

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to