On Tue, Aug 1, 2023 at 3:27 PM Daniel Letai <d...@letai.org.il> wrote: > The other OTHER approach might be to use some epilog (or possibly > epilogslurmctld) to log exit codes for first 20 tasks in each array, and > cancel the array if non-zero. This is a global approach which will affect all > job arrays, so might not be appropriate for your use case.
you can setup task prolog/epilog. just test for the error condition inthe task epilog and then cancel your array if need be https://slurm.schedmd.com/prolog_epilog.html i've not tried it, nor how it relates to array's but might work