To solve the problem below that toon describes where the scheduler believes 4 jobs can co-exist on a single node but they cannot because they are I/O (disk) bound jobs and will thrash the system.....
There are several ways in LSF, here are two... 1) create a new resource for the type of job call it 'widgets' and when the job is submitted tell LSF that this type of job consumes 1 widget. That will solve the problem LSF knowing the job is a 'big io job'. Then configure either the queue, hosts, users, or more complicated limits on the resource widget - say for example you configure hosts so that this particular host cannot have more than 1 widget job running. With this configuration LSF will know that it cannot run more than 1. This is a simple solution - easy to understand but has limitations....i.e. if the job is really only I/O bound for a period of time then the machine is actually under utilized once the job 'gets going'. 2) use the LSF resource reservation mechanism. This is more complex but essentially you can boil it down to the idea that you tell LSF to 'bump up the resource usage' on a resource, making it look like more I/O is consumed than really is consumed for a given period of time and apply a decay function so that the 'artificial bump in I/O' decreases over time..... Now once you have configured this you submit the job to LSF telling it to use the resource reservation and decay...then the job starts and the scheduler 'believes that the job is taking lots of I/O' even though it is not taking lots of I/O and does not start two of them (since you configured the host to only start 1 when I/O is high) then as the artificial I/O load decays the real I/O load kicks in after 4 minutes..... So the scheduler won't schedule two jobs when the I/O is high, however if this type of job is 'high I/O at beginning but much less later' then two jobs can start the second one will start after the I/O (reservation + real I/) falls below the threshold set in your queues or hosts for a job dispatch. Finally if you don't like those mechanisms you can create any type of 'resource or attribute' you want and apply them as a limit to LSF scheduling...so if needed you can create more complex I/O resources and use them for your scheduling. Regards, Bill. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Toon Knapen Sent: Tuesday, April 24, 2007 8:31 AM To: Tim Cutts Cc: 'beowulf@beowulf.org' Subject: Re: [Beowulf] scheduler policy design Tim Cutts wrote: >> but what if you have a bi-cpu bi-core machine to which you assign 4 >> slots. Now one slot is being used by a process which performs heavy >> IO. Suppose another process is launched that performs heavy IO. In >> that case the latter process should wait until the first one is done >> to avoid slowing down the efficiency of the system. Generally however, >> clusters take only time and memory requirements into account. > > I think that varies. LSF records the current I/O of a node as one of > its load indices, so you can request a node which is doing less than a > certain amount of I/O. I imagine the same is true of SGE, but I > wouldn't know. > Indeed, using SGE you could also take this into account. However if someone submits 4 jobs, the jobs do not directly start to generate heavy I/O. So the scheduler might think that the 4 jobs can easily coexist on this same node. However, after a few minutes all 4 jobs start eating disk BW and slow the node down horribly. What would your suggestion be to solve this ? thanks, toon _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf