On 07/05/2019 13.47, David Baker wrote:
> We are experiencing quite a number of database failures.
> [root@blue51 slurm]#*less slurmdbd.log-20190506.gz | grep failed*
> [2019-05-05T04:00:05.603] error: mysql_query failed: 1213 Deadlock found when
> trying to get lock; try restarting transaction
Hi,
you can always come up with some kind of submit "filter" that would
assign constrains to jobs based on requested memory. In this way you
can force smaller memory jobs to go only to low memory nodes and keep
large memory nodes free from trash jobs.
The disadvantage is that large mem nodes woul
On 18/03/2019 23.07, Eric Rosenberg wrote:
> [2019-03-15T09:48:43.000] update_node: node rn003 reason set to: Kill task
> failed
This usually happens for me when one of the shared filesystems
is overloadedand processes are stuck in uninterruptible sleep
(D), thus unableto terminate.
Your reason