There are some Limitations that mention backfill on the heterogeneous job
support page.
https://slurm.schedmd.com/heterogeneous_jobs.html#limitations
Maybe there’s some information there to help?
Ken
From: slurm-users On Behalf Of Ana
Jokanovic
Sent: Thursday, November 29, 2018 4
Chris, I have delved deep into the OOM killer code and interaction with
cpusets in the past (*).
That experience is not really relevant!
However I always recommend looking at this sysctl parameter
min_free_kbytes
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/perform
On 29-11-2018 19:27, Christopher Benjamin Coffey wrote:
We've been noticing an issue with nodes from time to time that become "wedged",
or unusable. This is a state where ps, and w hang. We've been looking into this for a
while when we get time and finally put some more effort into it yesterday