Hi slurm users - I’ve been looking through the slurm prolog/epilog manuals, but
haven’t been able to figure out if there’s a way to get an epilog script
(requested by the user) to run when a job is killed for running out of time,
and have the epilog script be able to detect that (through an env
Yeah, these are frustrating ones to troubleshoot. When I have seen this
in the past it was usually a missing forward or reverse in DNS that
cause the problem. You could try dialing up the verbosity all the way
and see what you can spot. Else I might recommend dropping a ticket
into the Sched
Thanks, Paul, yes, it does seem a likely cause, but I can’t see the problem.
All machines have the same /etc/hosts file and the worker nodes are just listed
one after each other. I’ve checked that the problem nodes are there — no
obvious difference. I’ve checked that the IP address is correct.
This smacks of either the submission host, the destination host, or the
master not being able to resolve the name to an IP. I would triple
check that to ensure that resolution is working.
-Paul Edmon-
On 11/7/18 8:33 AM, Scott Hazelhurst wrote:
Dear list
We have a relatively new installati
Dear list
We have a relatively new installation of SLURM. We have started to have a
problem with some of the nodes when using srun
[scott@cream-ce ~]$ srun --pty -w n38 hostname
srun: error: fwd_tree_thread: can't find address for host n38, check slurm.conf
srun: error: Task launch for 18710.0
On Wednesday, 7 November 2018 3:46:01 PM AEDT Brian Andrus wrote:
> Ah. I was getting ahead of myself. I used 'limits' and I have no limits
> configured, only associations. Changed it to just associations and all is
> good.
Excellent! Well spotted..
--
Chris Samuel : http://www.csamuel.org/
Try adding a default account and then set a limit of 0 jobs on it.
>From memory I think it is grpjobs
This is the maximum allowed jobs this account can have queued.
This requires limits to be enforced in accountingstorageenforce
Or you could simply add the account to the denyaccount list for t
I had exactly the same requirement - you can find my notes from it here;
https://funinit.wordpress.com/2018/06/07/how-to-use-job_submit_lua-with-slurm/
cheers,
Marcin
wt., 6 lis 2018 o 20:48 Sam Hawarden napisał(a):
> Hi Yair,
>
>
> You can set maxsubmitjob=0 on an account.
>
>
> The error mes