Hi;

Slurm log says that your prolog did not finish at 300 seconds.


Only possible cause that I see, is the line started with "sudo /usr/bin/beeond start -F -P -b /usr/bin/pdsh".


You can put a timeout command at the begining of the sudo line to test:

timeout 150  sudo /usr/bin/beeond start -F -P -b /usr/bin/pdsh ......


If the problem is solved with the timeout command, you should check sudoers permission is correctly set for password-less sudo command. You can check permission by executing this sudo line as the slurm user.


If sudoers permission is correct, but command takes too much time, you can increase this 300 seconds threshold.


Regards,


Ahmet M.





On 30.03.2022 15:59, Nicolas Sonoda wrote:
Hi!

I'm getting the following error with prolog when I try to alocate more then 2 nodes with Sbatch:

[2022-03-28T07:40:17.016] backfill: Started JobId=19825 in intel_large on n[01-05] [2022-03-28T07:45:17.310] _run_prolog: timeout after 300s: killing pgid 45004 [2022-03-28T07:45:17.310] error: prolog_slurmctld JobId=19825 prolog exit status 0:9

I have this configuration for my queue:

PartitionName=intel_large Nodes=n[01-10] Default=NO MaxTime=72:00:00 MaxNodes=5 OverSubscribe=EXCLUSIVE State=UP

And I'm attaching my slurmctld.prolog

Can you help me with that?

Thanks!

Reply via email to