Re: [slurm-users] [EXT] Slurmd problem on client

2020-08-24 Thread Brian Andrus
IIRC, that is because it is trying to do the 'configless' feature of slurm 20 where it uses DNS entries to find the config. This will happen if /etc/slurm.conf does not exist on the node. Check that you have that and that it is the same as the one on the master. Brian Andrus On 8/24/2020 7:03

Re: [slurm-users] [EXT] Slurmd problem on client

2020-08-24 Thread Lars Kloo
Dear Sean, ’/usr/local/sbin/slurmd -D –’ gave the following error (same as when running from systemctl): slurmd: error: _fetch_child: failed to fetch remote configs I have debug level 5 for both slurmctld and slurmd in slurm.conf, so there may be little more to extract in form of mes

Re: [slurm-users] [EXT] Slurmd problem on client

2020-08-24 Thread Sean Crosby
Make sure slurmd on the client is stopped, and then run it in verbose mode in the foreground e.g. /usr/local/slurm/latest/sbin/slurmd -D -v Then post the output -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melb

Re: [slurm-users] [EXT] Slurmd problem on client

2020-08-24 Thread Lars Kloo
Thanks Sean, Yes, the regular slurm commands work from the client. The firewalld daemon have been stopped/disabled, and iptables are set to let everything through, on both the master and the client. I should have mentioned that in the list of prerequisites in my initial e-mail. Best r

Re: [slurm-users] [EXT] Slurmd problem on client

2020-08-24 Thread Sean Crosby
Hi Lars, Do the regular slurm commands work from the client? e.g. squeue scontrol show part If they don't, it would be a sign of communication problems. Is there a software firewall running on the master/client? Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Compu

Re: [slurm-users] [slurm 20.02.3] don't suspend nodes in down state

2020-08-24 Thread Angelos Ching
I have some logic of making sure that the node to be acted on is in idle state in SuspendProgram and its helper programs, before power action is performed. Best regards, Angelos (Sent from mobile, please pardon me for typos and cursoriness.) > 2020/08/24 17:42、Jacek Budzowski のメール: > >  > Dear

[slurm-users] Slurmd problem on client

2020-08-24 Thread Lars Kloo
Hello, I have a client slurmd problem, that I cannot really figure out how to solve. I would be grateful for any suggestions on how to move forward. The master computer on a small local calculational cluster is getting quite old, and therefore I am currently in the process of exchanging it.

[slurm-users] Slurmd problem on client

2020-08-24 Thread Lars Kloo
Hello, I have a client slurmd problem, that I cannot really figure out how to solve. I would be grateful for any suggestions on how to move forward. The master computer on a small local calculational cluster is getting quite old, and therefore I am currently in the process of exchanging it.

Re: [slurm-users] [slurm 20.02.3] don't suspend nodes in down state

2020-08-24 Thread Jacek Budzowski
Dear Herbert, In our installation we also had this problem. Unfortunately we didn't find more elegant solution than change in Slurm code (and recompiling slurmctld). Here is the patch we use to prevent DOWN nodes to be suspended: diff --git a/src/slurmctld/power_save.c b/src/slurmctld/power_save.

[slurm-users] [slurm 20.02.3] don't suspend nodes in down state

2020-08-24 Thread Steininger, Herbert
Hi, how can I prevent slurm, to suspend nodes, which I have set to down state for maintenance? I know about "SuspendExcNodes", but this doesn't seem the right way, to roll out the slurm.conf every time this changes. Is there a state that I can set so that the nodes doesn't get suspended? It hap