[slurm-users] Slurm 24.05 and OpenMPI

2025-04-04 Thread Matthias Leopold via slurm-users
Hi, I built a small Slurm 21.08 cluster with NVIDIA GPU hardware and NVIDIA deepops framework a couple of years ago. It is based on Ubuntu 20.04 and makes use of the NVIDIA pyxis/enroot container solution. For operational validation I used the nccl-tests application in a container. nccl-tests

[slurm-users] Re: Run a command in Slurm with all streams and signals connected to the submitting command

2025-04-04 Thread Chris Samuel via slurm-users
On 4/4/25 5:23 am, Michael Milton via slurm-users wrote: Plain srun re-uses the existing Slurm allocation, and specifying resources like --mem will just request then from the current job rather than submitting a new one srun does that as it sees all the various SLURM_* environment variables

[slurm-users] Re: Run a command in Slurm with all streams and signals connected to the submitting command

2025-04-04 Thread Michael Milton via slurm-users
Thanks Davide, It's true that srun will create an allocation if you aren't inside a job, but if you are inside a job and you request more resources than it has, then srun will just fail. This is the key issue that I want to avoid. On Sat, Apr 5, 2025 at 11:48 AM Davide DelVento wrote: > The pla

[slurm-users] Re: Minimum cpu cores per node partition level configuration

2025-04-04 Thread Cutts, Tim via slurm-users
You can set a partition QoS which specifies a minimum. We have such a qos on our large-gpu partition; we don’t want people scheduling small stuff to it, so we have this qos: $ sacctmgr show qos large-gpu --json | jq '.QOS[] | { name: .name, min_limits: .limits.min }' { "name": "large-gpu

[slurm-users] Re: Preemption question

2025-04-04 Thread Kamil Wilczek via slurm-users
Hello David, thank you, this might be a simple and a viable solution to this problem. I'll test both (yours and Megan) solutions and then decide. Kind regards -- On Sun, Mar 30, 2025 at 08:19:12AM -0600, Davide DelVento via slurm-users wrote: Hi Kamil, I don't use QoS, so I don't have a dire

[slurm-users] Re: cpus and gpus partitions and how to optimize the resource usage

2025-04-04 Thread Davide DelVento via slurm-users
Ciao Massimo, How about creating another queue cpus_in_the_gpu_nodes (or something less silly) which targets the GPU nodes but does not allow the allocation of the GPUs with gres and allocates 96-8 (or whatever other number you deem appropriate) of the CPUs (and similarly with memory)? Actually it

[slurm-users] Run a command in Slurm with all streams and signals connected to the submitting command

2025-04-04 Thread Michael Milton via slurm-users
I'm helping with a workflow manager that needs to submit Slurm jobs. For logging and management reasons, the job (e.g. srun python) needs to be run as though it were a regular subprocess (python): - stdin, stdout and stderr for the command should be connected to process inside the job - s