[slurm-users] Features request

Relu Patrascu Thu, 24 Sep 2020 15:06:10 -0700

Hello all,

We're mostly a GPU compute shop, and we've been happy with slurm for thelast three years, but we think slurm would benefit from the followingtwo features:

1. Allow preemption in the same QOS, all else being equal, based on jobpriority.

2. Job size calculation to take into account the number of GPUsallocated to the job. In a GPU cluster the most valuable currency beingthe GPU, not the CPU. Perhaps even parameterize the job size so the usercould choose what to emphasize in calculation: cpu, gpu, memory.

If this is not the right place to ask for this, I would appreciate apointer in the right direction.



Justification:

It's pretty obvious why we'd like #2.

We want #1 because we believe it would allow for a more naturalmaximization of the cluster usage. A user X could grab the whole clusterif it's free, while another user Y, arriving later could get jobs in bypreempting some of the jobs of X. We're assuming the fairshare score ofuser X will decrease as resources are consumed, and Y's jobs will have ahigher priority. We also assume that requeue, checkpoint and restart areemployed. We also think that this would make the system more fair in thelong term, essentially time slicing usage through preemption based onpriority.


Relu

[slurm-users] Features request

Reply via email to