On 02-09-2022 20:52, Nicolas Sonoda wrote:
I'm submiting a job but after a few seconds it got cancelled and the Slurm output file show this message:

slurmstepd: error: *** JOB 23883 ON gn01 CANCELLED AT 2022-09-02T14:28:19 DUE TO JOB REQUEUE ***

After this the job turn into PD state on queue, with the reason: BeginTime:

JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
23884       gpu Memb.LS1     vhpc PD       0:00      1 (BeginTime)

And after a while the job stay on RH state with JobHoldMaxRequeue reason.

I'm attaching my script and input files.

Can you help me with that?

You could look in the slurmctld.log file and the node's slurmd.log file to see what they say about the job.

Check your slurm.conf requeue configuration:

$ scontrol show config | grep Requeue

/Ole

Reply via email to