Hi Andreas,

You could define a generic consumable resource per node and have the scheduling 
take account of requests for it. In principle, you could do this for say 
interface_bandwidth or io_bw and try and use real numbers, but in practice 
users don't know how much they need and will use and admins don't have 
capability to set strong limits anyway. As such, you may as well schedule a 
more abstract resource - let's call it 'eins'.

Define one 'eins' gres per node and have the jobs you want separated request 
gres=eins:1 The jobs will then run on separate nodes. Jobs that don't request 
gres=eins will not be separated.

If you want a bit more flexibility try a resource with a larger count per node 
(a different resource name, say 'vier', with 4 per node). Jobs could then 
request gres=vier:1 (up to 4 will run on a node), gres=vier:2 (only 2 per node) 
and so on (but not gres=vier:5!).

Maybe name the resource 'iocount' and expect heavy io users to only request 1. 
Then you can tune what you make available on nodes later, without requiring the 
users to change behaviour.

You could combine this with extra partitions and/or a filter to set defaults 
and make the choice/usage easier for your users.

Gareth 

-----Original Message-----
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of 
Andreas Hilboll
Sent: Tuesday, 8 May 2018 2:58 AM
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: [slurm-users] Limit number of specific concurrent jobs per node

Dear SLURM experts,

we have a cluster of 56 nodes with 28 cores each.  Is it possible to limit the 
number of jobs of a certain name which concurrently run on one node, without 
blocking the node for other jobs?

For example, when I do

   for filename in runtimes/*/jobscript.sh; do
     sbatch -J iojob -n 1 $filename
   done

How can I assure that only one of these jobs runs per node?  The jobs are very 
lightweight computationally and only use 1 core each, but since they are rather 
heavy on the I/O side, I'd like to ensure that when a job runs, it doesn't have 
to share the available I/O bandwidth with other jobs.  (This would actually 
work since usually our other jobs are not I/O intensive.)

>From reading the manpage, I couldn't figure out how to do this.


Sunny greetings,
 Andreas


Reply via email to