after 15 minutes, but the third job
>requires only two nodes and 2 minutes, thus it can start immediately, but this
>does not happen.
It seems there is a bug here. I also tried with the version 18.03, but it does
not work either.
Ana
On Fri, 30 Nov 2018 at 17:46, Ken
There are some Limitations that mention backfill on the heterogeneous job
support page.
https://slurm.schedmd.com/heterogeneous_jobs.html#limitations
Maybe there’s some information there to help?
Ken
From: slurm-users On Behalf Of Ana
Jokanovic
Sent: Thursday, November 29, 2018 4
users On Behalf Of
Kenneth Roberts
Sent: Monday, November 26, 2018 9:38 AM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] Slurm / OpenHPC socket timeout errors
I wasn't looking close enough at the times in the log file.
c2: [2018-11-26T10:09:40.963] debu
g out after 20 seconds.
Back to finding out why ...
From: slurm-users On Behalf Of
Kenneth Roberts
Sent: Monday, November 26, 2018 8:35 AM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] Slurm / OpenHPC socket timeout errors
Here is the debug log on a node (c2) when the job fa
ors reading
slurm.conf ...
Continuing the search ...
From: slurm-users On Behalf Of
Kenneth Roberts
Sent: Friday, November 23, 2018 4:15 PM
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] Slurm / OpenHPC socket timeout errors
Hi -
I have the following on a new cluster
Hi -
I have the following on a new cluster with OpenHPC & Slurm built off the
latest recipe and packages from OpenHPC (built this week).
One master node and 4 compute nodes.
NodeName=c[1-4] Sockets=2 CoresPerSocket=10 ThreadsPerCore=1 State=UNKNOWN
With simple test scripts, sbatch prod