On 25 Apr 2007, at 8:42 am, Toon Knapen wrote:

Interesting. However this approach requires that the IO profile of the application is known.

Absolutely.

Additionally it requires the users of the application (which are generally not IT guys) to know and understand this info and pass it on to the scheduler when they launch their app.

Absolutely.

In your experience, do you manage to convince real-life users to provide this info?

Not easily.  :-)

And this is the problem with getting scheduling right, and exactly what we were saying at the beginning of this discussion. You can't hope to schedule optimally if the scheduler doesn't know the profile of the application; the more information it knows the better the job it will do. But if your users, like mine, can't or won't supply this information, then you're very limited in what you can achieve, and your system will be vulnerable to denial of service because of strange mixes of jobs starting on the machines causing them to run out of various resources, and there is basically nothing you will be able to do about it.

The compromise we ended up with is this set of LSF queues on our system (a cluster with about 1500 job slots):

QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP yesterday 500 Open:Active 200 10 - - 1 0 1 0 normal 30 Open:Active - - - - 281 110 171 0 hugemem 30 Open:Active - - - - 3 0 3 0 long 3 Open:Active - - - - 4022 2987 1035 0 basement 1 Open:Active 300 200 - - 127 0 127 0

yesterday:

a special purpose high priority queue for the "I need it yesterday" crowd. No run length limits, but very limited in terms of how many slots the user can use.

normal:

queue intended for shortish jobs (around 1 hour). Absolute wall clock limit of 8 hours, after which jobs are killed.

long:

queue for longer jobs with an absolute wall clock limit of 24 hours.

hugemem:

special purpose queue for the two large memory SGI Altix nodes. Users submitting jobs to this queue *must* supply memory requirements; the submission is rejected if they do not.

basement:

queue for long running or low priority jobs. No time limits, but can't use more than a small fraction of the total cluster.

All the queues except hugemem also have a default memory limit of 1.9 GB; any job exceeding this limit is killed. If the user wants to raise this limit they can, up to 7.9 GB, but they are then forced by the same mechanism as the hugemem queue to supply proper memory resource requirements.

Here's an example of what happens if they don't:

--- EXAMPLE ---
14:07:31 [EMAIL PROTECTED]:~$ bsub -M 6000000 uname -a
Job submission rejected.


You are specifying your own memory limit, so you must also supply
select[mem] and rusage[mem] resource requirement parameters.  For
example:

   -M2000000 -R'select[mem>2000] rusage[mem=2000]'

Remember that memory limits are set in KB, resource memory in MB.
Sorry about that.  Blame Platform.

If you do not understand what this means, read the lsfintro manpage and
the following web page:

http://www.wtgc.org/IT/ISG/lsf/lsf_intro.shtml#resources

If you still don't understand after that, contact ssg-isg(at) sanger.ac.uk

Request aborted by esub. Job not submitted.
--- EXAMPLE ---


All this is designed so that users who can't or won't supply detailed parameters to LSF can still submit work, but they either are limited in terms of how many jobs they can run at once (in the yesterday and basement queues) or they run the risk of their job being killed if it goes astray and uses too much time or memory (in the normal and long queues).

Thus, it gives the users an incentive to understand their code and use the cluster carefully and responsibly. Until we put the hard run limits in place, the cluster was being brought to its knees at least once a week by users just being careless, and that was why we eventually had to be somewhat more draconian. It's worked though; the cluster has not had a similar DoS event since putting these rules into place.

Regards,

Tim
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to