Re: [slurm-users] Troubles with cgroups

2023-05-17 Thread Hermann Schwärzler
Hi everybody, I would like to give you a quick update on this problem (hanging systems when swapping due to cgroup memory-limits is happening): We had opened a case with RedHat's customer support. After some to and fro they could reproduce the problem. Last week they told us to upgrade to ve

Re: [slurm-users] Troubles with cgroups

2023-03-21 Thread Jason Simms
Hello Hermann, Thanks for following up about this. What you say makes sense: at Lafayette, we didn't experience the issue until upgrading to a Slurm version that supported cgroups/v2, and here at Swarthmore, we are still on a version of Slurm that doesn't and we don't have the issue (both Rocky 8)

Re: [slurm-users] Troubles with cgroups

2023-03-21 Thread Hermann Schwärzler
Hi Jason, thank you for your reply. From what I can tell your problem *is* the same as ours. BTW: we were already talking about disabling swap in our nodes as a last resort. :-) In the meantime we made some new findings: we can trigger the error when (with cgroups/v2) we set memory.high and m

Re: [slurm-users] Troubles with cgroups

2023-03-17 Thread Jason Simms
Hello, This isn't precisely related, but I can say that we were having strange issues with system load spiking to the point that the nodes became unresponsive and likewise needed a hard reboot. After several tests and working with our vendor, on nodes that we entirely disabled swap, the problems c