Re: [slurm-users] backfill scheduler does not work for heterogeneous jobs (version 17.11)

2018-11-30 Thread Kenneth Roberts
There are some Limitations that mention backfill on the heterogeneous job support page. https://slurm.schedmd.com/heterogeneous_jobs.html#limitations Maybe there’s some information there to help? Ken From: slurm-users On Behalf Of Ana Jokanovic Sent: Thursday, November 29, 2018 4

Re: [slurm-users] Wedged nodes from cgroups, OOM killer, and D state process

2018-11-30 Thread John Hearns
Chris, I have delved deep into the OOM killer code and interaction with cpusets in the past (*). That experience is not really relevant! However I always recommend looking at this sysctl parameter min_free_kbytes https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/perform

Re: [slurm-users] Wedged nodes from cgroups, OOM killer, and D state process

2018-11-30 Thread Ole Holm Nielsen
On 29-11-2018 19:27, Christopher Benjamin Coffey wrote: We've been noticing an issue with nodes from time to time that become "wedged", or unusable. This is a state where ps, and w hang. We've been looking into this for a while when we get time and finally put some more effort into it yesterday