Hi Mike,
but that would mean, that 409978 requests nearly the whole cluster. I'm
wondering for what resources it waits.
Yet, there are nearly 32000 nodes idle. I would assume, such one node
job would fit. But you are right, depends on the higher prio job.
Best
Marcus
On 3/30/20 3:47 PM, Ren
CentOS 7.7
Slurm 20.02
Having a bit of a time with jobs that are configured with a walltime of more
than 365 days. The job is accepted for run, but the squeue -l output shows the
TIME_LIMIT is INVALID.
If I look at the job through scontrol it shows the correct TimeLImit.
Any ideas as to what c
All of this is subject to scheduler configuration, but: what has job 409978
requested, in terms of resources and time? It looks like it's the highest
priority pending job in the interactive partition, and I’d expect the
interactive partition has a higher priority than the regress partition.
As
We have the same issue see:
* https://bugs.schedmd.com/show_bug.cgi?id=8527
* temporary fix we switched back to DefMemPerCpu
regards
On 26/03/2020 16:42, Wayne Hendricks wrote:
When using 20.02/cons_tres and defining DefMemPerGPU, jobs submitted
that request GPUs without defining “—mem” will