Perhaps fire from srun with -vvv to get maximum verbose messages as srun
fires through job.
Doug
On Thu, Jan 31, 2019 at 12:07 PM Andy Riebs wrote:
> Hi All,
>
> Just checking to see if this sounds familiar to anyone.
>
> Environment:
> - CentOS 7.5 x86_64
> - Slurm 17.11.10 (but this also happ
Hi,
Thanks again for all the suggestions.
It turns out that on our cluster we can't use the cgroups because of the old
kernel,
but setting
JobAcctGatherParams=UsePSS
resolved the problems.
Regards,
Sergey
On Fri, 2019-01-11 at 10:37 +0200, Janne Blomqvist wrote:
> On 11/01/2019
To be more clear, the jobs aren't starting due to the group being at their
limit, which is normal. But slurm is spamming that error to the log file for
every job that is at a particular GrpTRESRunLimit which is not normal.
Other than the log being littered with incorrect error messages, things
Hi All,
Just checking to see if this sounds familiar to anyone.
Environment:
- CentOS 7.5 x86_64
- Slurm 17.11.10 (but this also happened with 17.11.5)
We typically run about 100 tests/night, selected from a handful of
favorites. For roughly 1 in 300 test runs, we see one of two mysterious
fa
On 1/31/19 8:12 AM, Christopher Benjamin Coffey wrote:
This seems to be related to jobs that can't start due to in our case:
AssocGrpMemRunMinutes, and AssocGrpCPURunMinutesLimit
Must be a bug relating to GrpTRESRunLimit it seems.
Do you mean can't start due to not enough time, or can't star
Hi All,
This seems to be related to jobs that can't start due to in our case:
AssocGrpMemRunMinutes, and AssocGrpCPURunMinutesLimit
Must be a bug relating to GrpTRESRunLimit it seems.
Best,
Chris
—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
On 1
Hi, we upgraded to 18.08.5 this morning and are seeing odd errors in the
slurmctld logs:
[2019-01-31T08:24:13.684] error: select_nodes: calling _get_req_features() for
JobId=16599048 with not NULL job resources
[2019-01-31T08:24:13.685] error: select_nodes: calling _get_req_features() for
JobId
No. Jobs should continue as normal.
-Paul Edmon-
On 1/31/19 9:38 AM, Buckley, Ronan wrote:
Hi,
Does restarting the slurmctld daemon on a slurm head node affect
running slurm jobs on the compute nodes in any way?
Rgds
Nope per the documentation you have to restart the slurmctld to change
MaxJobCount.
-Paul Edmon-
On 1/31/19 5:58 AM, Buckley, Ronan wrote:
Hi,
I want to increase the MaxJobCount in the slurm.conf file from its
default value of 10,000. I want to increase it to 250,000.
The online documenta
Hi,
Does restarting the slurmctld daemon on a slurm head node affect running slurm
jobs on the compute nodes in any way?
Rgds
Hi,
I want to increase the MaxJobCount in the slurm.conf file from its default
value of 10,000. I want to increase it to 250,000.
The online documentation says:
MaxJobCount
The maximum number of jobs Slurm can have in its active database at one time.
Set the values of MaxJobCount and MinJobAge
11 matches
Mail list logo