[slurm-users] Problem building slurm with PMIx
Hi ! I manage a small CentOS8 cluster using slurm slurm-20.11.7-1 and OpenMPI built from sources. - I know this OS is not maintained any more and I need to negotiate downtime to reinstall - I know Slurm 20.11.7 has security issue (I've built it from source some years ago with rpmbuild -ta --with mysql --with hwloc slurm-20.11.7.tar.bz) and I should update. All was running fine until I add a GPU Node and Nvidia sdk. This SDK provides an openMPI3 implementation GPU aware but I'm unable to launch an intranode parallel job with it using srun: -- The application appears to have been direct launched using "srun", but OMPI was not built with SLURM's PMI support and therefore cannot execute. There are several options for building PMI support under SLURM, depending upon the SLURM version you are using: version 16.05 or later: you can use SLURM's PMIx support. This requires that you configure and build SLURM --with-pmix. Versions earlier than 16.05: you must use either SLURM's PMI-1 or PMI-2 support. SLURM builds PMI-1 by default, or you can manually install PMI-2. You must then build Open MPI using --with-pmi pointing to the SLURM PMI library location. Please configure as appropriate and try again. -- I check with "srun --mpi=list" and got no pmx. srun: MPI types are... srun: pmi2 srun: cray_shasta srun: none So I decide to build the rpms from slurm-20.11.9.tar.bz2 as I had done previously for 20.11.7 and update. I've first installed pmix-2.1.1-1 from src as I had no pmix-devel rpm in my local CentOS8 repo: rpm --rebuild pmix-2.1.1-1.el8.src.rpm dnf install pmix-devel-2.1.1-1.el8.x86_64.rpm pmix-2.1.1-1.el8.x86_64.rpm Then build Slurm from slurm-20.11.9.tar.bz2 (just changing python3 to python38 in the spec file) rpmbuild -ta --with mysql --with hwloc --with pmix slurm-20.11.9.tar.bz2 And the try to install these package on the GPU node dnf install slurm-slurmd-20.11.9-1.el8.x86_64.rpm slurm-20.11.9-1.el8.x86_64.rpm slurm-devel-20.11.9-1.el8.x86_64.rpm slurm-libpmi-20.11.9-1.el8.x86_64.rpm But I get this strange error: Error: Problem: conflicting requests - nothing provides pmix = 20.11.9 needed by slurm-slurmd-20.11.9-1.el8.x86_64 (try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages) Why this request on PMIX with the slurm version number ? Am I wrong somewhere ? Thanks for your help Patrick -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: Question about IB and Ethernet networks
Hi Josef, on a cluster using pxe boot and automatic (re) installation of nodes, I do not think you can do this with IPoIB on an infiniband interface. On my cluster nodes I have: - 1Gb ethernet network for OOB - 10 or 25Gb ethernet for session, automatic deployment and management - IB HDR100 for MPI Today data storage is reached via IPoIB on one cluster and via ethernet for the second one because it is not located in the same building (old IB QDR setup). I'm working in deploying an additional ceph storage cluster and it will also require ethernet as there are no IB on these new storage nodes (eth 25Gb only) These clusters are small (300-400 cores each). The HDR100 IB network is shared by a third cluster in the laboratory (shared purchase for the switch). So different technologies can be required together. This represents an investment but with costs amortization over a decade or more (my QDR setup is from 2012 and still in production). Patrick Le 26/02/2024 à 08:59, Josef Dvoracek via slurm-users a écrit : > Just looking for some feedback, please. Is this OK? Is there a better way? > I’m tempted to spec all new HPCs with only a high speed (200Gbps) IB network, Well you need Ethernet for OOB management (bmc/ipmi/ilo/whatever) anyway.. or? cheers josef On 25. 02. 24 21:12, Dan Healy via slurm-users wrote: This question is not slurm-specific, but it might develop into that. -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] First setup of slurm with a GPU node
Hi, I'm using slurm on a small 8 nodes cluster. I've recently added one GPU node with two Nvidia A100, one with 40Gb of RAM and one with 80Gb. As using this GPU resource increase I would like to manage this resource with Gres to avoid usage conflict. But at this time my setup do not works as I can reach a GPU without reserving it: srun -n 1 -p tenibre-gpu ./a.out can use a GPU even if the reservation do not specify this resource (checked with running nvidia-smi on the node). "tenibre-gpu" is a slurm partition with only this gpu node. From the documentation I've created a gres.conf file and it is propagated on all the nodes (9 compute nodes, 1 login node and the management node) and slurmd has been restarted. gres.conf is:* ## GPU setup on tenibre-gpu-0 NodeName=tenibre-gpu-0 Name=gpu Type=A100-40 File=/dev/nvidia0 Flags=nvidia_gpu_env NodeName=tenibre-gpu-0 Name=gpu Type=A100-80 File=/dev/nvidia1 Flags=nvidia_gpu_env * * In slurm.conf I have checked these flags: ## Basic scheduling SelectTypeParameters=CR_Core_Memory SchedulerType=sched/backfill SelectType=select/cons_tres ## Generic resources GresTypes=gpu ## Nodes list Nodename=tenibre-gpu-0 RealMemory=257270 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 State=UNKNOWN #partitions PartitionName=tenibre-gpu MaxTime=48:00:00 DefaultTime=12:00:00 DefMemPerCPU=4096 MaxMemPerCPU=8192 Shared=YES State=UP Nodes=tenibre-gpu-0 ... May be I've missed something ? I'm running Slurm 20.11.7-1. Thanks for your advices. Patrick -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: First setup of slurm with a GPU node
Le 13/11/2024 à 15:45, Roberto Polverelli Monti via slurm-users a écrit : Hello Patrick, On 11/13/24 12:01 PM, Patrick Begou via slurm-users wrote: As using this GPU resource increase I would like to manage this resource with Gres to avoid usage conflict. But at this time my setup do not works as I can reach a GPU without reserving it: srun -n 1 -p tenibre-gpu ./a.out can use a GPU even if the reservation do not specify this resource (checked with running nvidia-smi on the node). "tenibre-gpu" is a slurm partition with only this gpu node. I think what you're looking for is the ConstrainDevices parameter in cgroup.conf. See here: - https://slurm.schedmd.com/archive/slurm-20.11.7/cgroup.conf.html Best, Hi Roberto, thanks for pointing to this parameter. I set it, update all the nodes, restart slurmd everywhere but it does not change the behavior. However, when looking in the slurmd log on the GPU node I notice this information: [2024-11-13T16:41:08.434] debug: CPUs:32 Boards:1 Sockets:8 CoresPerSocket:4 ThreadsPerCore:1 *[2024-11-13T16:41:08.434] debug: gres/gpu: init: loaded* *[2024-11-13T16:41:08.434] WARNING: A line in gres.conf for GRES gpu:A100-40 has 1 more configured than expected in slurm.conf. Ignoring extra GRES.* *[2024-11-13T16:41:08.434] WARNING: A line in gres.conf for GRES gpu:A100-80 has 1 more configured than expected in slurm.conf. Ignoring extra GRES.* [2024-11-13T16:41:08.434] debug: gpu/generic: init: init: GPU Generic plugin loaded [2024-11-13T16:41:08.434] topology/none: init: topology NONE plugin loaded [2024-11-13T16:41:08.434] route/default: init: route default plugin loaded [2024-11-13T16:41:08.434] CPU frequency setting not configured for this node [2024-11-13T16:41:08.434] debug: Resource spec: No specialized cores configured by default on this node [2024-11-13T16:41:08.434] debug: Resource spec: Reserved system memory limit not configured for this node [2024-11-13T16:41:08.434] debug: Reading cgroup.conf file /etc/slurm/cgroup.conf [2024-11-13T16:41:08.434] error: MaxSwapPercent value (0.0%) is not a valid number [2024-11-13T16:41:08.436] debug: task/cgroup: init: core enforcement enabled [2024-11-13T16:41:08.437] debug: task/cgroup: task_cgroup_memory_init: task/cgroup/memory: total:257281M allowed:100%(enforced), swap:0%(enforced), max:100%(257281M) max+swap:100%(514562M) min:30M kmem:100%(257281M permissive) min:30M swappiness:0(unset) [2024-11-13T16:41:08.437] debug: task/cgroup: init: memory enforcement enabled *[2024-11-13T16:41:08.438] debug: task/cgroup: task_cgroup_devices_init: unable to open /etc/slurm/cgroup_allowed_devices_file.conf: No such file or directory* [2024-11-13T16:41:08.438] debug: task/cgroup: init: device enforcement enabled [2024-11-13T16:41:08.438] debug: task/cgroup: init: task/cgroup: loaded [2024-11-13T16:41:08.438] debug: auth/munge: init: Munge authentication plugin loaded So something is wrong in may gres.conf file I think as I ttry do configure 2 different devices on the node may be? ## GPU setup on tenibre-gpu-0 NodeName=tenibre-gpu-0 Name=gpu Type=A100-40 File=/dev/nvidia0 Flags=nvidia_gpu_env NodeName=tenibre-gpu-0 Name=gpu Type=A100-80 File=/dev/nvidia1 Flags=nvidia_gpu_env Patrick -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: First setup of slurm with a GPU node
Hi Benjamin, Yes, I saw this on an archived discussion too and I've added these parameters. A little bit tricky to do as my setup is deployed via Ansible. But with this setup I'm not able to request a GPU at all. All these test are failing and slurm do not accept the job: srun -n 1 -p tenibre-gpu --gres=gpu:A100-40 ./a.out srun -n 1 -p tenibre-gpu --gres=gpu:A100-40:1 ./a.out srun -n 1 -p tenibre-gpu --gpus-per-node=A100-40:1 ./a.out srun -n 1 -p tenibre-gpu --gpus-per-node=1 ./a.out srun -n 1 -p tenibre-gpu --gres=gpu:1 ./a.out May be some restrictions on the GPU type field with the "minus" sign ? No idea. But launching a GPU code without reserving a GPU is failing at execution time on the node. So a first step is done! May be should I upgrade my slurm version from 20.11 to the latest. But I had to set the cluster back in production without the GPU setup this evening. Patrick Le 13/11/2024 à 17:31, Benjamin Smith via slurm-users a écrit : Hi Patrick, You're missing a Gres= on your node in your slurm.conf: Nodename=tenibre-gpu-0 RealMemory=257270 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 State=UNKNOWN *Gres=gpu:A100-40:1,gpu:A100-80:1 * Ben On 13/11/2024 16:00, Patrick Begou via slurm-users wrote: This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Le 13/11/2024 à 15:45, Roberto Polverelli Monti via slurm-users a écrit : Hello Patrick, On 11/13/24 12:01 PM, Patrick Begou via slurm-users wrote: As using this GPU resource increase I would like to manage this resource with Gres to avoid usage conflict. But at this time my setup do not works as I can reach a GPU without reserving it: srun -n 1 -p tenibre-gpu ./a.out can use a GPU even if the reservation do not specify this resource (checked with running nvidia-smi on the node). "tenibre-gpu" is a slurm partition with only this gpu node. I think what you're looking for is the ConstrainDevices parameter in cgroup.conf. See here: - https://slurm.schedmd.com/archive/slurm-20.11.7/cgroup.conf.html Best, Hi Roberto, thanks for pointing to this parameter. I set it, update all the nodes, restart slurmd everywhere but it does not change the behavior. However, when looking in the slurmd log on the GPU node I notice this information: [2024-11-13T16:41:08.434] debug: CPUs:32 Boards:1 Sockets:8 CoresPerSocket:4 ThreadsPerCore:1 *[2024-11-13T16:41:08.434] debug: gres/gpu: init: loaded* *[2024-11-13T16:41:08.434] WARNING: A line in gres.conf for GRES gpu:A100-40 has 1 more configured than expected in slurm.conf. Ignoring extra GRES.* *[2024-11-13T16:41:08.434] WARNING: A line in gres.conf for GRES gpu:A100-80 has 1 more configured than expected in slurm.conf. Ignoring extra GRES.* [2024-11-13T16:41:08.434] debug: gpu/generic: init: init: GPU Generic plugin loaded [2024-11-13T16:41:08.434] topology/none: init: topology NONE plugin loaded [2024-11-13T16:41:08.434] route/default: init: route default plugin loaded [2024-11-13T16:41:08.434] CPU frequency setting not configured for this node [2024-11-13T16:41:08.434] debug: Resource spec: No specialized cores configured by default on this node [2024-11-13T16:41:08.434] debug: Resource spec: Reserved system memory limit not configured for this node [2024-11-13T16:41:08.434] debug: Reading cgroup.conf file /etc/slurm/cgroup.conf [2024-11-13T16:41:08.434] error: MaxSwapPercent value (0.0%) is not a valid number [2024-11-13T16:41:08.436] debug: task/cgroup: init: core enforcement enabled [2024-11-13T16:41:08.437] debug: task/cgroup: task_cgroup_memory_init: task/cgroup/memory: total:257281M allowed:100%(enforced), swap:0%(enforced), max:100%(257281M) max+swap:100%(514562M) min:30M kmem:100%(257281M permissive) min:30M swappiness:0(unset) [2024-11-13T16:41:08.437] debug: task/cgroup: init: memory enforcement enabled *[2024-11-13T16:41:08.438] debug: task/cgroup: task_cgroup_devices_init: unable to open /etc/slurm/cgroup_allowed_devices_file.conf: No such file or directory* [2024-11-13T16:41:08.438] debug: task/cgroup: init: device enforcement enabled [2024-11-13T16:41:08.438] debug: task/cgroup: init: task/cgroup: loaded [2024-11-13T16:41:08.438] debug: auth/munge: init: Munge authentication plugin loaded So something is wrong in may gres.conf file I think as I ttry do configure 2 different devices on the node may be? ## GPU setup on tenibre-gpu-0 NodeName=tenibre-gpu-0 Name=gpu Type=A100-40 File=/dev/nvidia0 Flags=nvidia_gpu_env NodeName=tenibre-gpu-0 Name=gpu Type=A100-80 File=/dev/nvidia1 Flags=nvidia_gpu_env Patrick -- Benjamin Smith Computing Officer, AT-7.12a Research and Teaching Unit School of Informatics, University of Edinburgh The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann cartha
[slurm-users] Re: sinfo not listing any partitions
Hi Kent, on your management node could you run: systemctl status slurmctld and check your 'Nodename=' and 'PartitionName=...' in /etc/slurm.conf ? In my slurm.conf I have a more detailed description and the Nodename Keyword start with an upper case (do'nt know if slurm.conf is case sensitive) : NodeName=kareline-0-[0-3] Sockets=2 CoresPerSocket=6 ThreadsPerCore=1 RealMemory=47900 and it looks like your nodes description is not understood by slurm. Patrick Le 27/11/2024 à 17:46, Ryan Novosielski via slurm-users a écrit : At this point, I’d probably crank up the logging some and see what it’s saying in slurmctld.log. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB A555B, Newark `' On Nov 27, 2024, at 11:38, Kent L. Hanson wrote: Hey Ryan, I have restarted the slurmctld and slurmd services several times. I hashed the slurm.conf files. They are the same. I ran “sinfo -a” as root with the same result. Thanks, Kent *From:*Ryan Novosielski *Sent:*Wednesday, November 27, 2024 9:31 AM *To:*Kent L. Hanson *Cc:*slurm-users@lists.schedmd.com *Subject:*Re: [slurm-users] sinfo not listing any partitions If you’re sure you’ve restarted everything after the config change, are you also sure that you don’t have that stuff hidden from your current user? You can try -a to rule that out. Or run as root. -- #BlackLivesMatter || \\UTGERS , |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB A555B, Newark `' On Nov 27, 2024, at 09:56, Kent L. Hanson via slurm-users wrote: I am doing a new install of slurm 24.05.3 I have all the packages built and installed on headnode and compute node with the same munge.key, slurm.conf, and gres.conf file. I was able to run munge and unmunge commands to test munge successfully. Time is synced with chronyd. I can’t seem to find any useful errors in the logs. For some reason when I run sinfo no nodes are listed. I just see the headers for each column. Has anyone seen this or know what a next step of troubleshooting would be? I’m new to this and not sure where to go from here. Thanks for any and all help! The odd output I am seeing [username@headnode ~] sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST */(Nothing is output showing status of partition or nodes)/* Slurm.conf ClusterName=slurmkvasir SlurmctldHost=kadmin2 MpiDefault=none ProctrackType=proctrack/cgroup PrologFlags=contain ReturnToService=2 SlurmctldPidFile=/var/run/slurm/slurmctld.pid SlurmctldPort=6817 SlurmPidFile=/var/run/slurm/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=slurm StateSaveLocation=/var/spool/slurmctld TaskPlugin=task/cgroup MinJobAge=600 SchedulerType=sched/backfill SelectType=select/cons_tres PriorityType=priority/multifactor AccountingStorageHost=localhost AccountingStoragePass=/var/run/munge/munge.socket.2 AccountingStorageType=accounting_storage/slurmdbd AccountingStorageTRES=gres/gpu,cpu,node JobCompType=jobcomp/none JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/cgroup SlurmctldDebug=info SlurmctldLogFile=/var/log/slurm/slurmctld.log SlurmdDebug=info SlurmLogFile=/var/log/slurm/slurmd.log nodeName=k[001-448] PartitionName=default Nodes=k[001-448] Default=YES MaxTime=INFINITE State=up Slurmctld.log Error: Configured MailProg is invalid Slurmctld version 24.05.3 started on cluster slurmkvasir Accounting_storage/slurmdbd: clusteracct_storage_p_register_ctld: Regisetering slurmctld at port 8617 Error: read_slurm_conf: default partition not set. Revovered state of 448 nodes Down nodes: k[002-448] Recovered information about 0 jobs Revovered state of 0 reservations Read_slurm_conf: backup_controller not specified Select/cons_tres; select_p_reconfigure: select/cons_tres: reconfigure Running as primary controller Slurmd.log Error: Node configuration differs from hardware: CPUS=1:40(hw) Boards=1:1(hw) SocketsPerBoard=1:2(hw) CoresPerSocket=1:20(hw) ThreadsPerCore:1:1(hw) CPU frequency setting not configured for this node Slurmd version 24.05.3started Slurmd started on Wed, 27 Nov 2024 06:51:03 -0700 CPUS=1 Boards=1 Cores=1 Threads=1 Memory=192030 TmpDisk=95201 uptime 166740 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null) Error: _/forward/_thread: failed to k019 (10.
[slurm-users] slurm releases
Hi slurm team, I would ask some clarifications with slurm releases. Why two versions of slurm are available ? I speak of 24.05.7 versus 24.11.3 on https://www.schedmd.com/slurm-support/release-announcements and announces made on this list ? I'm managing small clusters in a french public research lab and I've successfully updated (lets says re-installed as it was very old) an old one with slurm-24.05.3 some month ago. Now I'm trying to deploy a second cluster with the same approach and I've selected 24.11.3-1 (but why not using 24.05.7 ?). Moreover I have some difficulties for deploying 24.11.3 : I've built the rpm with: rpmbuild \ --define '_prefix /usr' \ --with pmix --with ucx --with mysql --with hwloc \ -ta slurm-24.11.3.tar.bz2 but some packages wont install: $ dnf list --showduplicates slurm-slurmd Last metadata expiration check: 0:33:15 ago on Tue Apr 1 18:03:07 2025. Available Packages slurm-slurmd.x86_64 24.11.3-1.el9 bb-local $ sudo dnf install slurm-slurmd Last metadata expiration check: 0:33:03 ago on Tue Apr 1 18:03:25 2025. Error: Problem: conflicting requests - nothing provides is needed by slurm-slurmd-24.11.3-1.el9.x86_64 from bb-local - nothing provides installed needed by slurm-slurmd-24.11.3-1.el9.x86_64 from bb-local - nothing provides not needed by slurm-slurmd-24.11.3-1.el9.x86_64 from bb-local (try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages) I think I will try to build 24.05.7 or 24.05.3 as a next try but I'm interested in any advices. Thank you Patrick -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: Setting QoS with slurm 24.05.7
Hi Michael, thanks for your explanation. I understand that setting "MaxTRESMinsPerJob=cpu=172800" will allow (in my case) - a job on the full cluster for 6h - a job on half of the cluster for 12 hours But if I do not wont the same user to run at the same time 2 jobs on half of the cluster for 12 hours (and fill in the cluster for long time) how can I limit his running jobs at 172800 minutes*cpu ? I was looking for something like "MaxTRESMinsPerUser" but do not find such a limitation resource. Patrick Le 18/04/2025 à 17:17, Michael Gutteridge a écrit : Hi I think you want one of the "MaxTRESMins*" options: MaxTRESMins=TRES=[,TRES=,...] MaxTRESMinsPJ=TRES=[,TRES=,...] MaxTRESMinsPerJob=TRES=[,TRES=,...] Maximum number of TRES minutes each job is able to use in this association. This is overridden if set directly on a user. Default is the cluster's limit. To clear a previously set value use the modify command with a new value of -1 for each TRES id. - sacctmgr(1) The "MaxCPUs" is a limit on the number of CPUs the association can use. -- Michael On Fri, Apr 18, 2025 at 8:01 AM Patrick Begou via slurm-users wrote: Hi all, I'm trying to setup a QoS on a small 5 nodes cluster running slurm 24.05.7. My goal is to limit the resources on a (time x number of cores) strategy to avoid one large job requesting all the resources for too long time. I've read from https://slurm.schedmd.com/qos.html and some discussion but my setup is still not working. I think I need to set these informations: MaxCPUsPerJob=172800 MaxWallDurationPerJob=48:00:00 Flags=DenyOnLimit,OverPartQOS for: 12h max for 240 cores => (12*240*60=172800mn) no job can exceed 2 days do not accept jobs out of these limits. What I've done: 1) create the QoS: sudo sacctmgr add qos workflowlimit \ MaxWallDurationPerJob=48:00:00 \ MaxCPUsPerJob=172800 \ Flags=DenyOnLimit,OverPartQOS 2) Check sacctmgr show qos Name=workflowlimit format=Name%16,MaxTRES,MaxWall Name MaxTRES MaxWall - --- workflowlimit cpu=172800 2-00:00:00 3) Set the QoS for the account "most" which is the default account for the users: sudo sacctmgr modify account name=most set qos=workflowlimit 4) Check $ sacctmgr show assoc format=account,cluster,user,qos Account Cluster User QOS -- -- -- root osorno normal root osorno root normal legi osorno normal most osorno workflowlimit most osorno begou workflowlimit 5) Modifiy slurm.conf with: AccountingStorageEnforce=limits,qos and propagate on the 5 nodes and the front end (done via Ansible) 6) Check clush -b -w osorno-fe,osorno,osorno-0-[0-4] 'grep AccountingStorageEnforce /etc/slurm/slurm.conf' --- osorno,osorno-0-[0-4],osorno-fe (7) --- AccountingStorageEnforce=limits,qos 7) restart slurmd on all the compute nodes and slurmctld + slurmdbd on the management node. But I can still request 400 cores for 24 hours: [begou@osorno ~]$ srun -n 400 -t 24:0:0 --pty bash bash-5.1$ squeue JOBID PARTITION NAME USER ST TIME START_TIME TIME_LIMIT CPUS NODELIST(REASON) 147 genoa bash begou R 0:03 2025-04-18T16:52:11 1-00:00:00 400 osorno-0-[0-4] So I must have missed something ? My partition (I've only one) in slurm.conf is: PartitionName=genoa State=UP Default=YES MaxTime=48:00:00 DefaultTime=24:00:00 Shared=YES OverSubscribe=NO Nodes=osorno-0-[0-4] Thanks Patrick -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Setting QoS with slurm 24.05.7
Hi all, I'm trying to setup a QoS on a small 5 nodes cluster running slurm 24.05.7. My goal is to limit the resources on a (time x number of cores) strategy to avoid one large job requesting all the resources for too long time. I've read from https://slurm.schedmd.com/qos.html and some discussion but my setup is still not working. I think I need to set these informations: MaxCPUsPerJob=172800 MaxWallDurationPerJob=48:00:00 Flags=DenyOnLimit,OverPartQOS for: 12h max for 240 cores => (12*240*60=172800mn) no job can exceed 2 days do not accept jobs out of these limits. What I've done: 1) create the QoS: sudo sacctmgr add qos workflowlimit \ MaxWallDurationPerJob=48:00:00 \ MaxCPUsPerJob=172800 \ Flags=DenyOnLimit,OverPartQOS 2) Check sacctmgr show qos Name=workflowlimit format=Name%16,MaxTRES,MaxWall Name MaxTRES MaxWall - --- workflowlimit cpu=172800 2-00:00:00 3) Set the QoS for the account "most" which is the default account for the users: sudo sacctmgr modify account name=most set qos=workflowlimit 4) Check $ sacctmgr show assoc format=account,cluster,user,qos Account Cluster User QOS -- -- -- root osorno normal root osorno root normal legi osorno normal most osorno workflowlimit most osorno begou workflowlimit 5) Modifiy slurm.conf with: AccountingStorageEnforce=limits,qos and propagate on the 5 nodes and the front end (done via Ansible) 6) Check clush -b -w osorno-fe,osorno,osorno-0-[0-4] 'grep AccountingStorageEnforce /etc/slurm/slurm.conf' --- osorno,osorno-0-[0-4],osorno-fe (7) --- AccountingStorageEnforce=limits,qos 7) restart slurmd on all the compute nodes and slurmctld + slurmdbd on the management node. But I can still request 400 cores for 24 hours: [begou@osorno ~]$ srun -n 400 -t 24:0:0 --pty bash bash-5.1$ squeue JOBID PARTITION NAME USER ST TIME START_TIME TIME_LIMIT CPUS NODELIST(REASON) 147 genoa bash begou R 0:03 2025-04-18T16:52:11 1-00:00:00 400 osorno-0-[0-4] So I must have missed something ? My partition (I've only one) in slurm.conf is: PartitionName=genoa State=UP Default=YES MaxTime=48:00:00 DefaultTime=24:00:00 Shared=YES OverSubscribe=NO Nodes=osorno-0-[0-4] Thanks Patrick -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: Setting QoS with slurm 24.05.7
Yes Michael! With this setup it does the job. There are so many tuning possibilities in Slurm I had missed this one. Thank you very much. Patrick Le 22/04/2025 à 16:30, Michael Gutteridge a écrit : WHoops, my mistake, sorry. Is this closer to what you want: MaxTRESRunMinsPU MaxTRESRunMinsPerUser Maximum number of TRES minutes each user is able to use. This takes into consideration the time limit of running jobs. If the limit is reached, no new jobs are started until other jobs finish to allow time to free up. https://slurm.schedmd.com/sacctmgr.html#OPT_MaxTRESRunMinsPU - Michael On Tue, Apr 22, 2025 at 1:35 AM Patrick Begou wrote: Hi Michael, thanks for your explanation. I understand that setting "MaxTRESMinsPerJob=cpu=172800" will allow (in my case) - a job on the full cluster for 6h - a job on half of the cluster for 12 hours But if I do not wont the same user to run at the same time 2 jobs on half of the cluster for 12 hours (and fill in the cluster for long time) how can I limit his running jobs at 172800 minutes*cpu ? I was looking for something like "MaxTRESMinsPerUser" but do not find such a limitation resource. Patrick Le 18/04/2025 à 17:17, Michael Gutteridge a écrit : Hi I think you want one of the "MaxTRESMins*" options: MaxTRESMins=TRES=[,TRES=,...] MaxTRESMinsPJ=TRES=[,TRES=,...] MaxTRESMinsPerJob=TRES=[,TRES=,...] Maximum number of TRES minutes each job is able to use in this association. This is overridden if set directly on a user. Default is the cluster's limit. To clear a previously set value use the modify command with a new value of -1 for each TRES id. - sacctmgr(1) The "MaxCPUs" is a limit on the number of CPUs the association can use. -- Michael On Fri, Apr 18, 2025 at 8:01 AM Patrick Begou via slurm-users wrote: Hi all, I'm trying to setup a QoS on a small 5 nodes cluster running slurm 24.05.7. My goal is to limit the resources on a (time x number of cores) strategy to avoid one large job requesting all the resources for too long time. I've read from https://slurm.schedmd.com/qos.html and some discussion but my setup is still not working. I think I need to set these informations: MaxCPUsPerJob=172800 MaxWallDurationPerJob=48:00:00 Flags=DenyOnLimit,OverPartQOS for: 12h max for 240 cores => (12*240*60=172800mn) no job can exceed 2 days do not accept jobs out of these limits. What I've done: 1) create the QoS: sudo sacctmgr add qos workflowlimit \ MaxWallDurationPerJob=48:00:00 \ MaxCPUsPerJob=172800 \ Flags=DenyOnLimit,OverPartQOS 2) Check sacctmgr show qos Name=workflowlimit format=Name%16,MaxTRES,MaxWall Name MaxTRES MaxWall - --- workflowlimit cpu=172800 2-00:00:00 3) Set the QoS for the account "most" which is the default account for the users: sudo sacctmgr modify account name=most set qos=workflowlimit 4) Check $ sacctmgr show assoc format=account,cluster,user,qos Account Cluster User QOS -- -- -- root osorno normal root osorno root normal legi osorno normal most osorno workflowlimit most osorno begou workflowlimit 5) Modifiy slurm.conf with: AccountingStorageEnforce=limits,qos and propagate on the 5 nodes and the front end (done via Ansible) 6) Check clush -b -w osorno-fe,osorno,osorno-0-[0-4] 'grep AccountingStorageEnforce /etc/slurm/slurm.conf' --- osorno,osorno-0-[0-4],osorno-fe (7) --- AccountingStorageEnforce=limits,qos 7) restart slurmd on all the compute nodes and slurmctld + slurmdbd on the management node. But I can still request 400 cores for 24 hours: [begou@osorno ~]$ srun -n 400 -t 24:0:0 --pty bash bash-5.1$ squeue JOBID PARTITION NAME USER ST TIME START_TIME TIME_LIMIT CPUS NODELIST(REASON) 147 genoa bash begou R 0:03 2025-04-18T16:52:11 1-00:00:00 400 osorno-0-[0-4] So I must have missed something ? My partition (I've only one) in slurm.conf is: PartitionName=genoa State=UP Default=YES