[slurm-users] Node configuration unavailable when using --mem-per-gpu , for specific GPU type

2024-12-13 Thread Matthew R. Baney via slurm-users
Hi all, I'm seeing some odd behavior when using the --mem-per-gpu flag instead of the --mem flag to request memory when also requesting all available CPUs on a node (in this example, all available nodes have 32 CPUs): $ srun --ntasks-per-node=8 --cpus-per-task=4 --gpus-per-node=gtx1080ti:1 --mem-

[slurm-users] Optimizing CPU socket affinities and NVLink

2024-08-08 Thread Matthew R. Baney via slurm-users
Hello, I've recently adopted setting AutoDetect=nvml in our GPU nodes' gres.conf files to automatically populate Cores and Links for GPUs, which has been working well. I'm now wondering if I can prioritize having single GPU jobs scheduled on NVLink pairs (these are PCIe A6000s) where one of the G

[slurm-users] Enforcing relative resource restrictions in submission script

2024-02-27 Thread Matthew R. Baney via slurm-users
Hello Slurm users, I'm trying to write a check in our job_submit.lua script that enforces relative resource requirements such as disallowing more than 4 CPUs or 48GB of memory per GPU. The QOS itself has a MaxTRESPerJob of cpu=32,gres/gpu=8,mem=384G (roughly one full node), but we're looking to pr

[slurm-users] Preempt jobs to stay within account TRES limits?

2022-10-21 Thread Matthew R. Baney
Hello, I have noticed that jobs submitted to non-preemptable partitions (PreemptType = preempt/partition_prio and PreemptMode = REQUEUE) under accounts with GrpTRES limits will become pending with AssocGrpGRES as the reason when the account is up against the relevant limit, even when there are oth