Hi;
Slurm log says that your prolog did not finish at 300 seconds.
Only possible cause that I see, is the line started with "sudo
/usr/bin/beeond start -F -P -b /usr/bin/pdsh".
You can put a timeout command at the begining of the sudo line to test:
timeout 150 sudo /usr/bin/beeond start -F -P -b /usr/bin/pdsh ......
If the problem is solved with the timeout command, you should check
sudoers permission is correctly set for password-less sudo command. You
can check permission by executing this sudo line as the slurm user.
If sudoers permission is correct, but command takes too much time, you
can increase this 300 seconds threshold.
Regards,
Ahmet M.
On 30.03.2022 15:59, Nicolas Sonoda wrote:
Hi!
I'm getting the following error with prolog when I try to alocate more
then 2 nodes with Sbatch:
[2022-03-28T07:40:17.016] backfill: Started JobId=19825 in intel_large
on n[01-05]
[2022-03-28T07:45:17.310] _run_prolog: timeout after 300s: killing
pgid 45004
[2022-03-28T07:45:17.310] error: prolog_slurmctld JobId=19825 prolog
exit status 0:9
I have this configuration for my queue:
PartitionName=intel_large Nodes=n[01-10] Default=NO MaxTime=72:00:00
MaxNodes=5 OverSubscribe=EXCLUSIVE State=UP
And I'm attaching my slurmctld.prolog
Can you help me with that?
Thanks!