On Thursday, 15 April 2021, at 10:58:31 (-0300),
Heitor wrote:
> I'm trying to setup NHC[0] for our Slurm cluster, but I'm not
> getting it to work properly.
Just for future reference, NHC has its own mailing lists, and even
though your question does relate to Slurm tangentially, it's really an
N
SchedMD is currently planning the annual Slurm User Group Meeting. We
would like to hold the meeting in person in the Salt Lake City, Utah,
USA area.
Could you please take a moment to fill out the 3 question survey, link
below, to help us know if you would be able to attend an in person
Slurm User
Submission did succeed when I put the -K0 inside the srun command within my
job script though. Will be a while before my job runs though, so won't know
for a little while whether the KillOnBadExit flag has helped.
On Tue, 20 Apr 2021 at 16:57, Robert Peck wrote:
> P.S. the slurm version here is
P.S. the slurm version here is 20.02.3
On Tue, 20 Apr 2021 at 16:55, Robert Peck wrote:
> Chris: thanks for that tip, I'm having a look at that now, it sounds
> promising.
>
> Run on the login node I get:
> scontrol show config | fgrep KillOnBadExit
> KillOnBadExit = 0
>
> I've tried t
Chris: thanks for that tip, I'm having a look at that now, it sounds
promising.
Run on the login node I get:
scontrol show config | fgrep KillOnBadExit
KillOnBadExit = 0
I've tried to put -K0 in to a job to see if that helps.
But doing it on the command line
sbatch -K0 job_name.job
giv