Re: [slurm-users] stopping job array after N failed jobs in row

2023-08-02 Thread Michael DiDomenico
On Tue, Aug 1, 2023 at 3:27 PM Daniel Letai wrote: > The other OTHER approach might be to use some epilog (or possibly > epilogslurmctld) to log exit codes for first 20 tasks in each array, and > cancel the array if non-zero. This is a global approach which will affect all > job arrays, so migh

Re: [slurm-users] stopping job array after N failed jobs in row

2023-08-01 Thread Loris Bennett
Daniel Letai writes: > Not sure about automatically canceling a job array, except perhaps by > submitting 2 consecutive arrays - first of size 20, and the other with the > rest of > the elements and a dependency of afterok. That said, a single job in a job > array in Slurm documentation is ref

Re: [slurm-users] stopping job array after N failed jobs in row

2023-08-01 Thread Daniel Letai
Not sure about automatically canceling a job array, except perhaps by submitting 2 consecutive arrays - first of size 20, and the other with the rest of the elements and a dependency of afterok. That said, a single job in a job array in Slurm documentation is refe