[slurm-users] jobs getting stuck in CG

Ricardo Román-Brenes via slurm-users Mon, 10 Feb 2025 00:30:20 -0800

Hello everyone.

I have a cluster composed of 16 nodes, with 4 of them having GPUs with no
particular configuration to manage them.
The filesystem is gluster, authentication via slapd/munge.


My problem is that very frequently, let's say at least a job daily, gets
stuck in CG. I have no idea why this happens. Manually killing the
slurmstep process releases the node but this is in no way a manageable
solution. Has anyone experienced this (and fixed it?)

Thank you.

-Ricardo

-- 
slurm-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[slurm-users] jobs getting stuck in CG

Reply via email to