[slurm-users] qos between partitions

2017-11-22 Thread Nadav Toledo
Hey everyone, Perhaps I am asking a basic question, but I really dont understand how the preemption works. The scenario(simplified for the example) is like this: Nodes: NodeName=A1  CPUS=2 RealMemory=128906 TmpDisk=117172 NodeName=A2  CPUS=30 RealMemory=128

Re: [slurm-users] giving smaller jobs higher priority

2017-11-22 Thread Jessica Nettelblad
We have * high priority qos for short jobs. The qos is set at submission time by the user or in the lua script. * partitions for jobs of certain length or other requirements. Sometimes several partitions overlap. * a script that adjusts priorities according to our policies every 5 minutes. By comb

Re: [slurm-users] giving smaller jobs higher priority

2017-11-22 Thread Satrajit Ghosh
hi sam, thanks for that pointer. we already have: PriorityFavorSmall=YES PriorityFlags=SMALL_RELATIVE_TO_TIME but small jobs still seem to hold up. that's because cores more than nodes are important in our usage scenarios. 99% of jobs request one node so ideally we wanted to assign negative we

Re: [slurm-users] giving smaller jobs higher priority

2017-11-22 Thread Sam Gallop (NBI)
Hi Satra, Have a look at PriorityFavorSmall (in slurm.conf). It may fit your needs. Not used it myself, so I'm not able to say if it'll do exactly what you're after. --- Samuel Gallop From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of Satrajit Ghosh Sent: 22 Novembe

[slurm-users] giving smaller jobs higher priority

2017-11-22 Thread Satrajit Ghosh
slurm has a way of giving larger jobs more priority. is it possible to do the reverse? i.e., is there a way to configure priority to give smaller jobs (use less resources) higher priority than bigger ones? cheers, satra resources: can be a weighted combination depending on system resources avai

[slurm-users] Increasing MaxArraySize

2017-11-22 Thread Loris Bennett
Hi, In the documentation on job arrays https://slurm.schedmd.com/job_array.html it says Be mindful about the value of MaxArraySize as job arrays offer an easy way for users to submit large numbers of jobs very quickly. How much do I have to worry about this, if I am using fairshare sched

Re: [slurm-users] GPU job still running after SLURM job is killed

2017-11-22 Thread John Hearns
Matt, I saw a similar situation with a PBS job recently. A process with is writing to disk cannot be killed (it is in S state). So the job ended but PBS logged that it could not kill the process. I would look in detail at the slurm logs at the point where that job is being killed, and you might g

[slurm-users] GPU job still running after SLURM job is killed

2017-11-22 Thread Matt McKinnon
Hi All, I'm wondering if you've seen this issue around, I can't seem to find anything on it: We have an NVIDIA DGX-1 that we run SLURM on in order to queue up jobs on the GPU's there, but we're running into an issue: 1) launch a SLURM job (assume job id = 12345) 2) start a program