Re: [slurm-users] draining nodes due to failed killing of task?

2021-08-07 Thread Adrian Sevcenco
Hi! On 8/8/21 3:19 AM, Chris Samuel wrote: On Friday, 6 August 2021 12:02:45 AM PDT Adrian Sevcenco wrote: i was wondering why a node is drained when killing of task fails and how can i disable it? (i use cgroups) moreover, how can the killing of task fails? (this is on slurm 19.05) Slurm ha

Re: [slurm-users] draining nodes due to failed killing of task?

2021-08-07 Thread Chris Samuel
On Friday, 6 August 2021 12:02:45 AM PDT Adrian Sevcenco wrote: > i was wondering why a node is drained when killing of task fails and how can > i disable it? (i use cgroups) moreover, how can the killing of task fails? > (this is on slurm 19.05) Slurm has tried to kill processes, but they refuse

Re: [slurm-users] 19.05->20.11 update:: slurmdbd failure - SOLVED

2021-08-07 Thread Adrian Sevcenco
On 8/7/21 9:50 PM, Adrian Sevcenco wrote: Hi! I just upgraded slurm from 19.05 to 20.11 (all services stopped before) and now, after checking the configuration slurmdbd do not start anymore: [2021-08-07T21:42:01.890] error: Database settings not recommended values: innodb_buffer_pool_size innodb

Re: [slurm-users] draining nodes due to failed killing of task?

2021-08-07 Thread Adrian Sevcenco
On 8/6/21 6:06 PM, Willy Markuske wrote: Adrian and Diego, Hi! Are you using AMD Epyc processors when viewing this issue? I've been having the same issue but only on dual AMD Epyc i do have some epyc nodes, but the cpu proportion is 50%/50% with broadwell cores .. and i do not see a correlat

[slurm-users] 19.05->20.11 update:: slurmdbd failure

2021-08-07 Thread Adrian Sevcenco
Hi! I just upgraded slurm from 19.05 to 20.11 (all services stopped before) and now, after checking the configuration slurmdbd do not start anymore: [2021-08-07T21:42:01.890] error: Database settings not recommended values: innodb_buffer_pool_size innodb_log_file_size innodb_lock_wait_timeout [