Dear Mercan
Thank you! — yes different paths so different behaviour. Amazing how you can
spend so much time looking at something and not seeing it.
On Sunday did an upgrade from 17.11.10 to 17.11.12 to try to fix the problem
but had left old binaries in a directory I should not have, so kept
Hi;
Are there some typo errors or they are really different paths:
/opt/exp_soft/slurm/bin/srun
vs.
which srun
/opt/exp_soft/bin/srun
Ahmet Mercan
13.11.2018 11:24 tarihinde Scott Hazelhurst yazdı:
Dear all
I still haven’t found the cause to the problem I raised last week where srun -w
Dear all
I still haven’t found the cause to the problem I raised last week where srun -w
xx runs for some nodes but not for others — thanks for the ideas.
One intriguing result I’ve had trying to pursue this which I thought I’d share
in case it sparks some ideas. If I give the full path for s
Yeah, these are frustrating ones to troubleshoot. When I have seen this
in the past it was usually a missing forward or reverse in DNS that
cause the problem. You could try dialing up the verbosity all the way
and see what you can spot. Else I might recommend dropping a ticket
into the Sched
Thanks, Paul, yes, it does seem a likely cause, but I can’t see the problem.
All machines have the same /etc/hosts file and the worker nodes are just listed
one after each other. I’ve checked that the problem nodes are there — no
obvious difference. I’ve checked that the IP address is correct.
This smacks of either the submission host, the destination host, or the
master not being able to resolve the name to an IP. I would triple
check that to ensure that resolution is working.
-Paul Edmon-
On 11/7/18 8:33 AM, Scott Hazelhurst wrote:
Dear list
We have a relatively new installati
Dear list
We have a relatively new installation of SLURM. We have started to have a
problem with some of the nodes when using srun
[scott@cream-ce ~]$ srun --pty -w n38 hostname
srun: error: fwd_tree_thread: can't find address for host n38, check slurm.conf
srun: error: Task launch for 18710.0