On Tue, Aug 1, 2023 at 3:27 PM Daniel Letai wrote:
> The other OTHER approach might be to use some epilog (or possibly
> epilogslurmctld) to log exit codes for first 20 tasks in each array, and
> cancel the array if non-zero. This is a global approach which will affect all
> job arrays, so migh
Daniel Letai writes:
> Not sure about automatically canceling a job array, except perhaps by
> submitting 2 consecutive arrays - first of size 20, and the other with the
> rest of
> the elements and a dependency of afterok. That said, a single job in a job
> array in Slurm documentation is ref
Not sure about automatically canceling a job array, except
perhaps by submitting 2 consecutive arrays - first of size 20, and
the other with the rest of the elements and a dependency of
afterok. That said, a single job in a job array in Slurm
documentation is refe
my users found the beauty of job arrays, and they tend to use it every
then and now.
Sometimes human factor steps in, and something is wrong in job array
specification, and cluster "works" on one failed array job after another.
Isn't there any way how to automatically stop/scancel/? job array