Re: [slurm-users] Nodes remaining in drain state once job completes

2019-03-18 Thread Pawel R. Dziekonski
On 18/03/2019 23.07, Eric Rosenberg wrote: > [2019-03-15T09:48:43.000] update_node: node rn003 reason set to: Kill task > failed This usually happens for me when one of the shared filesystems is overloadedand processes are stuck in uninterruptible sleep (D), thus unableto terminate. Your reason

[slurm-users] How to give different quota to different users with a QOS?

2019-03-18 Thread Jaekyeom Kim
Hi, I made two QOSes in Slurm to define two levels of priorities and preemption. And, I can limit, for instance, the maximum number of high-priority jobs running or pending submitted by each user, to 5 by setting MaxSubmitJobsPerUser=5 for the high-priority QOS. But if I want to give more quota

[slurm-users] Nodes remaining in drain state once job completes

2019-03-18 Thread Eric Rosenberg
Hello, I've set up a few nodes on slurm to test with and am having trouble. It seems that once a job has met it's wall time, the node that it ran on enters the comp state then remains in the drain state until I manually set the state to resume. Looking at the slurm log on the head node, I see the