On 23/1/20 7:09 pm, Dean Schulze wrote:
Pretty strange that having a Gres= property on a node that doesn't have
a gpu would get it stuck in the drain state.
Slurm verifies that nodes have the capabilities you say they have so
that should a node boot with less RAM than it should have, or a soc
The problem turned out to be that I had Gres=gpu:gp100:1 on the NodeName
line for that node and it didn't have a gpu or a gres.conf. Once I moved
that to the correct NodeName line in slurm.conf that node came out of the
drain state and became usable again.
Pretty strange that having a Gres= prope
Hey Dean,
Does 'scontrol show node https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#configure-firewall-for-slurm-daemons
Also check that slurmd daemons on the compute nodes can talk to each other
(not just to the master). e.g. bottom of
https://slurm.schedmd.com/big_sys.html
Regards,
Alex
I've tried the normal things with scontrol (
https://blog.redbranch.net/2015/12/26/resetting-drained-slurm-node/), but I
have a node that will not come out of the drain state.
I've also done a hard reboot and tried again. Are there any other remedies?
Thanks.