Re: [slurm-users] DefMemPerGPU bug?

Bas van der Vlies Mon, 30 Mar 2020 03:51:09 -0700

We have the same issue see:
 * https://bugs.schedmd.com/show_bug.cgi?id=8527
 * temporary fix we switched back to DefMemPerCpu


regards

On 26/03/2020 16:42, Wayne Hendricks wrote:

When using 20.02/cons_tres and defining DefMemPerGPU, jobs submittedthat request GPUs without defining “—mem” will not run more than one jobper node. I can see where it is allocating the correct amount of memoryfor the job per GPUs requested, but no other jobs will run on the node.If a value for “—mem” is defined, other jobs will share the node. Isthis the expected behavior? I understand that when jobs do not requestmemory it is assumed that the job is running on the whole node, but herewhen we are asking for GPUs there is a default memory set withDefMemPerGPU and it seems this is not being taken into account. Let meknow if there is a reason for this behavior or if there is another wayto set the default job memory.
Config:
SelectType=select/cons_tres
SelectTypeParameters=CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK
PartitionName=p100 Nodes=ucs480 OverSubscribe=FORCE:4 DefCpuPerGPU=20DefMemPerGPU=125000 Default=YES MaxTime=INFINITE State=UP
Node and job state when two jobs submitted with each requesting half theGPUs (no —mem specified):
    CfgTRES=cpu=80,mem=500000M,billing=80
    AllocTRES=cpu=40,mem=250000M

Job state:
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
872      p100  test-s6 wayne.he PD       0:00      1 (Resources)
871      p100  test-s5 wayne.he  R       0:03      1 ucs480


--
--
Bas van der Vlies
| Operations, Support & Development | SURFsara | Science Park 140 | 1098 XG
Amsterdam
| T +31 (0) 20 800 1300 | bas.vandervl...@surfsara.nl | www.surfsara.nl |

Re: [slurm-users] DefMemPerGPU bug?

Reply via email to