[slurm-users] Re: cpus and gpus partitions and how to optimize the resource usage

2025-04-04 Thread Davide DelVento via slurm-users
Ciao Massimo, How about creating another queue cpus_in_the_gpu_nodes (or something less silly) which targets the GPU nodes but does not allow the allocation of the GPUs with gres and allocates 96-8 (or whatever other number you deem appropriate) of the CPUs (and similarly with memory)? Actually it

[slurm-users] Re: cpus and gpus partitions and how to optimize the resource usage

2025-04-01 Thread Davide DelVento via slurm-users
Yes, I think so, but that should be no problem. I think that requires your Slurm was built using the --enable-multiple-slurmd configure option, so you might need to rebuild Slurm, if you didn't use that option in the first place. On Mon, Mar 31, 2025 at 7:32 AM Massimo Sgaravatto < massimo.sgarava

[slurm-users] Re: cpus and gpus partitions and how to optimize the resource usage

2025-03-31 Thread Paul Edmon via slurm-users
To me at least the simplest solution would be to create 3 partitions. The first is for the cpu only nodes, the second is the gpu nodes and the third is a lower priority requeue partition. This is how we do it here. This way the requeue partition can be used to grab the cpu's on the gpu nodes wi

[slurm-users] Re: cpus and gpus partitions and how to optimize the resource usage

2025-03-31 Thread Paul Raines via slurm-users
What I have done is setup partition QOSes for nodes with 4 GPUs and 64 cores sacctmgr add qos lcncpu-part sacctmgr modify qos lcncpu-part set priority=20 \ flags=DenyOnLimit MaxTRESPerNode=cpu=32,gres/gpu=0 sacctmgr add qos lcngpu-part sacctmgr modify qos lcn-part set priority=20 \ flag

[slurm-users] Re: cpus and gpus partitions and how to optimize the resource usage

2025-03-31 Thread Massimo Sgaravatto via slurm-users
Hi Davide Thanks for your feedback If gpu01 and cpusingpu01 are physically the same node, doesn't this mean that I have to start 2 slurmd on that node (one with "slurmd -N gpu01" and one with "slurmd -N cpusingpu01") ? Thanks, Massimo On Mon, Mar 31, 2025 at 3:22 PM Davide DelVento wrote: >