Re: [slurm-users] Slurm database failure messages

2019-05-07 Thread Pawel R. Dziekonski
On 07/05/2019 13.47, David Baker wrote: > We are experiencing quite a number of database failures. > [root@blue51 slurm]#*less slurmdbd.log-20190506.gz | grep failed* > [2019-05-05T04:00:05.603] error: mysql_query failed: 1213 Deadlock found when > trying to get lock; try restarting transaction

Re: [slurm-users] Increasing job priority based on resources requested.

2019-04-21 Thread Pawel R. Dziekonski
Hi, you can always come up with some kind of submit "filter" that would assign constrains to jobs based on requested memory. In this way you can force smaller memory jobs to go only to low memory nodes and keep large memory nodes free from trash jobs. The disadvantage is that large mem nodes woul

Re: [slurm-users] Nodes remaining in drain state once job completes

2019-03-18 Thread Pawel R. Dziekonski
On 18/03/2019 23.07, Eric Rosenberg wrote: > [2019-03-15T09:48:43.000] update_node: node rn003 reason set to: Kill task > failed This usually happens for me when one of the shared filesystems is overloadedand processes are stuck in uninterruptible sleep (D), thus unableto terminate. Your reason