IIRC, that is because it is trying to do the 'configless' feature of
slurm 20 where it uses DNS entries to find the config.
This will happen if /etc/slurm.conf does not exist on the node.
Check that you have that and that it is the same as the one on the master.
Brian Andrus
On 8/24/2020 7:03
Dear Sean,
’/usr/local/sbin/slurmd -D –’ gave the following error (same as when
running from systemctl):
slurmd: error: _fetch_child: failed to fetch remote configs
I have debug level 5 for both slurmctld and slurmd in slurm.conf, so there may
be little more to extract in form of mes
Make sure slurmd on the client is stopped, and then run it in verbose mode
in the foreground
e.g.
/usr/local/slurm/latest/sbin/slurmd -D -v
Then post the output
--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Computing Services | Business Services
The University of Melb
Thanks Sean,
Yes, the regular slurm commands work from the client.
The firewalld daemon have been stopped/disabled, and iptables are set to let
everything through, on both the master and the client. I should have mentioned
that in the list of prerequisites in my initial e-mail.
Best r
Hi Lars,
Do the regular slurm commands work from the client?
e.g.
squeue
scontrol show part
If they don't, it would be a sign of communication problems.
Is there a software firewall running on the master/client?
Sean
--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Compu
I have some logic of making sure that the node to be acted on is in idle state
in SuspendProgram and its helper programs, before power action is performed.
Best regards,
Angelos
(Sent from mobile, please pardon me for typos and cursoriness.)
> 2020/08/24 17:42、Jacek Budzowski のメール:
>
>
> Dear
Hello,
I have a client slurmd problem, that I cannot really figure out how to
solve. I would be grateful for any suggestions on how to move forward.
The master computer on a small local calculational cluster is getting quite
old, and therefore I am currently in the process of exchanging it.
Hello,
I have a client slurmd problem, that I cannot really figure out how to
solve. I would be grateful for any suggestions on how to move forward.
The master computer on a small local calculational cluster is getting quite
old, and therefore I am currently in the process of exchanging it.
Dear Herbert,
In our installation we also had this problem.
Unfortunately we didn't find more elegant solution than change in Slurm
code (and recompiling slurmctld).
Here is the patch we use to prevent DOWN nodes to be suspended:
diff --git a/src/slurmctld/power_save.c b/src/slurmctld/power_save.
Hi,
how can I prevent slurm, to suspend nodes, which I have set to down state for
maintenance?
I know about "SuspendExcNodes", but this doesn't seem the right way, to roll
out the slurm.conf every time this changes.
Is there a state that I can set so that the nodes doesn't get suspended?
It hap
10 matches
Mail list logo