Re: [slurm-users] srun using infiniband

2022-09-03 Thread Daniel Letai
Hello Anne, On 01/09/2022 02:01:53, Anne Hammond wrote: We have a    CentOS 8.5 cluster    slurm 20.11   Mellanox ConnectX 6 HDR IB and Mellanox 32 port switch Our application is not scaling.  I

Re: [slurm-users] Jobs cancelled due to job requeue

2022-09-03 Thread Ole Holm Nielsen
On 02-09-2022 20:52, Nicolas Sonoda wrote: I'm submiting a job but after a few seconds it got cancelled and the Slurm output file show this message: slurmstepd: error: *** JOB 23883 ON gn01 CANCELLED AT 2022-09-02T14:28:19 DUE TO JOB REQUEUE *** After this the job turn into PD state on queue