Re: [slurm-users] Problem with job allocation

mercan Wed, 30 Mar 2022 06:31:41 -0700

Hi;

Slurm log says that your prolog did not finish at 300 seconds.

Only possible cause that I see, is the line started with "sudo/usr/bin/beeond start -F -P -b /usr/bin/pdsh".



You can put a timeout command at the begining of the sudo line to test:

timeout 150  sudo /usr/bin/beeond start -F -P -b /usr/bin/pdsh ......

If the problem is solved with the timeout command, you should checksudoers permission is correctly set for password-less sudo command. Youcan check permission by executing this sudo line as the slurm user.

If sudoers permission is correct, but command takes too much time, youcan increase this 300 seconds threshold.



Regards,


Ahmet M.





On 30.03.2022 15:59, Nicolas Sonoda wrote:

Hi!
I'm getting the following error with prolog when I try to alocate morethen 2 nodes with Sbatch:
[2022-03-28T07:40:17.016] backfill: Started JobId=19825 in intel_largeon n[01-05][2022-03-28T07:45:17.310] _run_prolog: timeout after 300s: killingpgid 45004[2022-03-28T07:45:17.310] error: prolog_slurmctld JobId=19825 prologexit status 0:9
I have this configuration for my queue:
PartitionName=intel_large Nodes=n[01-10] Default=NO MaxTime=72:00:00MaxNodes=5 OverSubscribe=EXCLUSIVE State=UP
And I'm attaching my slurmctld.prolog

Can you help me with that?

Thanks!

Re: [slurm-users] Problem with job allocation

Reply via email to