[slurm-users] Problem building slurm with PMIx

2024-02-14 Thread Patrick Begou via slurm-users

Hi !

I manage a small CentOS8 cluster using slurm  slurm-20.11.7-1 and 
OpenMPI built from sources.
- I know this OS is not maintained any more and I need to negotiate 
downtime to reinstall
- I know Slurm 20.11.7 has security issue (I've built it from source 
some years ago with rpmbuild -ta --with mysql --with hwloc 
slurm-20.11.7.tar.bz) and I should update.


All was running fine until I add a GPU Node and Nvidia sdk. This SDK 
provides an openMPI3 implementation GPU aware but I'm unable to launch 
an intranode parallel job with it using srun:


--
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:

  version 16.05 or later: you can use SLURM's PMIx support. This
  requires that you configure and build SLURM --with-pmix.

  Versions earlier than 16.05: you must use either SLURM's PMI-1 or
  PMI-2 support. SLURM builds PMI-1 by default, or you can manually
  install PMI-2. You must then build Open MPI using --with-pmi pointing
  to the SLURM PMI library location.

Please configure as appropriate and try again.
--

I check with "srun --mpi=list" and got no pmx.
srun: MPI types are...
srun: pmi2
srun: cray_shasta
srun: none

So I decide to build the rpms from slurm-20.11.9.tar.bz2 as I had done 
previously for 20.11.7 and update.
I've first installed pmix-2.1.1-1 from src as I had no pmix-devel rpm in 
my local CentOS8 repo:

rpm --rebuild pmix-2.1.1-1.el8.src.rpm
dnf install pmix-devel-2.1.1-1.el8.x86_64.rpm pmix-2.1.1-1.el8.x86_64.rpm

Then build Slurm from slurm-20.11.9.tar.bz2 (just changing python3 to 
python38 in the spec file)

rpmbuild -ta --with mysql --with hwloc --with pmix slurm-20.11.9.tar.bz2

And the try to install these package on the GPU node
dnf install slurm-slurmd-20.11.9-1.el8.x86_64.rpm 
slurm-20.11.9-1.el8.x86_64.rpm slurm-devel-20.11.9-1.el8.x86_64.rpm 
slurm-libpmi-20.11.9-1.el8.x86_64.rpm


But I get this strange error:

Error:
 Problem: conflicting requests
  - nothing provides pmix = 20.11.9 needed by 
slurm-slurmd-20.11.9-1.el8.x86_64
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' 
to use not only best candidate packages)


Why this request on PMIX with the slurm version number ? Am I wrong 
somewhere ?



Thanks for your help


Patrick


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Question about IB and Ethernet networks

2024-03-03 Thread Patrick Begou via slurm-users

Hi Josef,

on a cluster using pxe boot and automatic (re) installation of nodes, I 
do not think you can do this with IPoIB on an infiniband interface.


On my cluster nodes I have:
- 1Gb ethernet network for OOB
- 10 or 25Gb ethernet for session, automatic deployment and management
- IB HDR100 for MPI

Today data storage is reached via IPoIB on one cluster and via ethernet 
for the second one because it is not located in the same building (old 
IB QDR setup).
I'm working in deploying an additional ceph storage cluster and it will 
also require ethernet as there are no IB on these new storage nodes (eth 
25Gb only)


These clusters are small (300-400 cores each). The HDR100 IB network is 
shared by a third cluster in the laboratory (shared purchase for the 
switch).


So different technologies can be required together. This represents an 
investment but with costs amortization over a decade or more (my QDR 
setup is from 2012 and still in production).


Patrick

Le 26/02/2024 à 08:59, Josef Dvoracek via slurm-users a écrit :
> Just looking for some feedback, please. Is this OK? Is there a 
better way?


> I’m tempted to spec all new HPCs with only a high speed (200Gbps) IB 
network,


Well you need Ethernet for OOB management (bmc/ipmi/ilo/whatever) 
anyway.. or?


cheers

josef

On 25. 02. 24 21:12, Dan Healy via slurm-users wrote:


This question is not slurm-specific, but it might develop into that.





--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] First setup of slurm with a GPU node

2024-11-13 Thread Patrick Begou via slurm-users

Hi,

I'm using slurm on a small 8 nodes cluster. I've recently added one GPU 
node with two Nvidia A100, one with 40Gb of RAM and one with 80Gb.


As using this GPU resource increase I would like to manage this resource 
with Gres to avoid usage conflict. But at this time my setup do not 
works as I can reach a GPU without reserving it:


   srun -n 1 -p tenibre-gpu ./a.out

can use a GPU even if the reservation do not specify this resource 
(checked with running nvidia-smi  on the node). "tenibre-gpu" is a slurm 
partition with only this gpu node.


From the documentation I've created a gres.conf file and it is 
propagated on all the nodes (9 compute nodes, 1 login node and the 
management node) and slurmd has been restarted.


gres.conf is:*

   ## GPU setup on tenibre-gpu-0
   NodeName=tenibre-gpu-0 Name=gpu Type=A100-40 File=/dev/nvidia0
   Flags=nvidia_gpu_env
   NodeName=tenibre-gpu-0 Name=gpu Type=A100-80 File=/dev/nvidia1
   Flags=nvidia_gpu_env
   *
   *

In slurm.conf I have checked these flags:

   ## Basic scheduling
   SelectTypeParameters=CR_Core_Memory
   SchedulerType=sched/backfill
   SelectType=select/cons_tres

   ## Generic resources
   GresTypes=gpu

   ## Nodes list
   
   Nodename=tenibre-gpu-0 RealMemory=257270 Sockets=2 CoresPerSocket=16
   ThreadsPerCore=1 State=UNKNOWN
   

   #partitions
   PartitionName=tenibre-gpu MaxTime=48:00:00 DefaultTime=12:00:00
   DefMemPerCPU=4096 MaxMemPerCPU=8192 Shared=YES  State=UP
   Nodes=tenibre-gpu-0
   ...



May be I've missed something ?  I'm running Slurm 20.11.7-1.

Thanks for your advices.

Patrick

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: First setup of slurm with a GPU node

2024-11-13 Thread Patrick Begou via slurm-users

Le 13/11/2024 à 15:45, Roberto Polverelli Monti via slurm-users a écrit :

Hello Patrick,

On 11/13/24 12:01 PM, Patrick Begou via slurm-users wrote:
As using this GPU resource increase I would like to manage this 
resource with Gres to avoid usage conflict. But at this time my setup 
do not works as I can reach a GPU without reserving it:


    srun -n 1 -p tenibre-gpu ./a.out

can use a GPU even if the reservation do not specify this resource 
(checked with running nvidia-smi  on the node). "tenibre-gpu" is a 
slurm partition with only this gpu node.


I think what you're looking for is the ConstrainDevices parameter in 
cgroup.conf.


See here:
- https://slurm.schedmd.com/archive/slurm-20.11.7/cgroup.conf.html

Best,


Hi Roberto,

thanks for pointing to this parameter. I set it, update all the nodes, 
restart slurmd everywhere but it does not change the behavior.
However, when looking in the slurmd log on the GPU node I notice this 
information:



[2024-11-13T16:41:08.434] debug:  CPUs:32 Boards:1 Sockets:8 
CoresPerSocket:4 ThreadsPerCore:1

*[2024-11-13T16:41:08.434] debug: gres/gpu: init: loaded*
*[2024-11-13T16:41:08.434] WARNING: A line in gres.conf for GRES 
gpu:A100-40 has 1 more configured than expected in slurm.conf. Ignoring 
extra GRES.*
*[2024-11-13T16:41:08.434] WARNING: A line in gres.conf for GRES 
gpu:A100-80 has 1 more configured than expected in slurm.conf. Ignoring 
extra GRES.*
[2024-11-13T16:41:08.434] debug: gpu/generic: init: init: GPU Generic 
plugin loaded

[2024-11-13T16:41:08.434] topology/none: init: topology NONE plugin loaded
[2024-11-13T16:41:08.434] route/default: init: route default plugin loaded
[2024-11-13T16:41:08.434] CPU frequency setting not configured for this node
[2024-11-13T16:41:08.434] debug:  Resource spec: No specialized cores 
configured by default on this node
[2024-11-13T16:41:08.434] debug:  Resource spec: Reserved system memory 
limit not configured for this node
[2024-11-13T16:41:08.434] debug:  Reading cgroup.conf file 
/etc/slurm/cgroup.conf
[2024-11-13T16:41:08.434] error: MaxSwapPercent value (0.0%) is not a 
valid number

[2024-11-13T16:41:08.436] debug: task/cgroup: init: core enforcement enabled
[2024-11-13T16:41:08.437] debug: task/cgroup: task_cgroup_memory_init: 
task/cgroup/memory: total:257281M allowed:100%(enforced), 
swap:0%(enforced), max:100%(257281M) max+swap:100%(514562M) min:30M 
kmem:100%(257281M permissive) min:30M swappiness:0(unset)
[2024-11-13T16:41:08.437] debug: task/cgroup: init: memory enforcement 
enabled
*[2024-11-13T16:41:08.438] debug: task/cgroup: task_cgroup_devices_init: 
unable to open /etc/slurm/cgroup_allowed_devices_file.conf: No such file 
or directory*
[2024-11-13T16:41:08.438] debug: task/cgroup: init: device enforcement 
enabled

[2024-11-13T16:41:08.438] debug: task/cgroup: init: task/cgroup: loaded
[2024-11-13T16:41:08.438] debug: auth/munge: init: Munge authentication 
plugin loaded


So something is wrong in may gres.conf file I think as I ttry do 
configure 2 different devices on the node may be?


## GPU setup on tenibre-gpu-0
NodeName=tenibre-gpu-0 Name=gpu Type=A100-40 File=/dev/nvidia0 
Flags=nvidia_gpu_env
NodeName=tenibre-gpu-0 Name=gpu Type=A100-80 File=/dev/nvidia1 
Flags=nvidia_gpu_env


Patrick

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: First setup of slurm with a GPU node

2024-11-13 Thread Patrick Begou via slurm-users

Hi Benjamin,

Yes, I saw this on an archived discussion too and I've added these 
parameters. A little bit tricky to do as my setup is deployed via 
Ansible. But with this setup I'm not able to request a GPU at all. All 
these test are failing and slurm do not accept the job:


srun -n 1 -p tenibre-gpu --gres=gpu:A100-40 ./a.out
srun -n 1 -p tenibre-gpu --gres=gpu:A100-40:1 ./a.out
srun -n 1 -p tenibre-gpu --gpus-per-node=A100-40:1 ./a.out
srun -n 1 -p tenibre-gpu --gpus-per-node=1 ./a.out
srun -n 1 -p tenibre-gpu --gres=gpu:1 ./a.out

May be some restrictions on the GPU type field with the "minus" sign ? 
No idea. But launching a GPU code without reserving a GPU is failing at 
execution time on the node. So a first step is done!


May be should I upgrade my slurm version  from 20.11 to the latest. But 
I had to set the cluster back in production without the GPU setup this 
evening.


Patrick

Le 13/11/2024 à 17:31, Benjamin Smith via slurm-users a écrit :


Hi Patrick,

You're missing a Gres= on your node in your slurm.conf:

Nodename=tenibre-gpu-0 RealMemory=257270 Sockets=2 CoresPerSocket=16 
ThreadsPerCore=1 State=UNKNOWN *Gres=gpu:A100-40:1,gpu:A100-80:1

*

Ben


On 13/11/2024 16:00, Patrick Begou via slurm-users wrote:

This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that 
the email is genuine and the content is safe.

Le 13/11/2024 à 15:45, Roberto Polverelli Monti via slurm-users a écrit :

Hello Patrick,

On 11/13/24 12:01 PM, Patrick Begou via slurm-users wrote:
As using this GPU resource increase I would like to manage this 
resource with Gres to avoid usage conflict. But at this time my 
setup do not works as I can reach a GPU without reserving it:


    srun -n 1 -p tenibre-gpu ./a.out

can use a GPU even if the reservation do not specify this resource 
(checked with running nvidia-smi  on the node). "tenibre-gpu" is a 
slurm partition with only this gpu node.


I think what you're looking for is the ConstrainDevices parameter in 
cgroup.conf.


See here:
- https://slurm.schedmd.com/archive/slurm-20.11.7/cgroup.conf.html

Best,


Hi Roberto,

thanks for pointing to this parameter. I set it, update all the 
nodes, restart slurmd everywhere but it does not change the behavior.
However, when looking in the slurmd log on the GPU node I notice this 
information:



[2024-11-13T16:41:08.434] debug: CPUs:32 Boards:1 Sockets:8 
CoresPerSocket:4 ThreadsPerCore:1

*[2024-11-13T16:41:08.434] debug: gres/gpu: init: loaded*
*[2024-11-13T16:41:08.434] WARNING: A line in gres.conf for GRES 
gpu:A100-40 has 1 more configured than expected in slurm.conf. 
Ignoring extra GRES.*
*[2024-11-13T16:41:08.434] WARNING: A line in gres.conf for GRES 
gpu:A100-80 has 1 more configured than expected in slurm.conf. 
Ignoring extra GRES.*
[2024-11-13T16:41:08.434] debug: gpu/generic: init: init: GPU Generic 
plugin loaded
[2024-11-13T16:41:08.434] topology/none: init: topology NONE plugin 
loaded
[2024-11-13T16:41:08.434] route/default: init: route default plugin 
loaded
[2024-11-13T16:41:08.434] CPU frequency setting not configured for 
this node
[2024-11-13T16:41:08.434] debug: Resource spec: No specialized cores 
configured by default on this node
[2024-11-13T16:41:08.434] debug: Resource spec: Reserved system 
memory limit not configured for this node
[2024-11-13T16:41:08.434] debug: Reading cgroup.conf file 
/etc/slurm/cgroup.conf
[2024-11-13T16:41:08.434] error: MaxSwapPercent value (0.0%) is not a 
valid number
[2024-11-13T16:41:08.436] debug: task/cgroup: init: core enforcement 
enabled
[2024-11-13T16:41:08.437] debug: task/cgroup: 
task_cgroup_memory_init: task/cgroup/memory: total:257281M 
allowed:100%(enforced), swap:0%(enforced), max:100%(257281M) 
max+swap:100%(514562M) min:30M kmem:100%(257281M permissive) min:30M 
swappiness:0(unset)
[2024-11-13T16:41:08.437] debug: task/cgroup: init: memory 
enforcement enabled
*[2024-11-13T16:41:08.438] debug: task/cgroup: 
task_cgroup_devices_init: unable to open 
/etc/slurm/cgroup_allowed_devices_file.conf: No such file or directory*
[2024-11-13T16:41:08.438] debug: task/cgroup: init: device 
enforcement enabled

[2024-11-13T16:41:08.438] debug: task/cgroup: init: task/cgroup: loaded
[2024-11-13T16:41:08.438] debug: auth/munge: init: Munge 
authentication plugin loaded


So something is wrong in may gres.conf file I think as I ttry do 
configure 2 different devices on the node may be?


## GPU setup on tenibre-gpu-0
NodeName=tenibre-gpu-0 Name=gpu Type=A100-40 File=/dev/nvidia0 
Flags=nvidia_gpu_env
NodeName=tenibre-gpu-0 Name=gpu Type=A100-80 File=/dev/nvidia1 
Flags=nvidia_gpu_env


Patrick



--
Benjamin Smith
Computing Officer, AT-7.12a
Research and Teaching Unit
School of Informatics, University of Edinburgh
The University of Edinburgh is a charitable body, registered in 
Scotland, with registration number SC005336. Is e buidheann 
cartha

[slurm-users] Re: sinfo not listing any partitions

2024-11-28 Thread Patrick Begou via slurm-users

Hi Kent,

on your management node could you run:
systemctl status slurmctld

and check your 'Nodename=' and 'PartitionName=...' in 
/etc/slurm.conf ? In my slurm.conf I have a more detailed description 
and the Nodename Keyword start with an upper case (do'nt know if 
slurm.conf is case sensitive) :


NodeName=kareline-0-[0-3]  Sockets=2 CoresPerSocket=6 ThreadsPerCore=1 
RealMemory=47900


and it looks like your nodes description is not understood by slurm.

Patrick


Le 27/11/2024 à 17:46, Ryan Novosielski via slurm-users a écrit :
At this point, I’d probably crank up the logging some and see what 
it’s saying in slurmctld.log.


--
#BlackLivesMatter

|| \\UTGERS, |---*O*---
||_// the State |         Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ 
RBHS Campus
||  \\    of NJ | Office of Advanced Research Computing - MSB 
A555B, Newark

     `'


On Nov 27, 2024, at 11:38, Kent L. Hanson  wrote:

Hey Ryan,
I have restarted the slurmctld and slurmd services several times. I 
hashed the slurm.conf files. They are the same. I ran “sinfo -a” as 
root with the same result.

Thanks,

Kent
*From:*Ryan Novosielski 
*Sent:*Wednesday, November 27, 2024 9:31 AM
*To:*Kent L. Hanson 
*Cc:*slurm-users@lists.schedmd.com
*Subject:*Re: [slurm-users] sinfo not listing any partitions
If you’re sure you’ve restarted everything after the config change, 
are you also sure that you don’t have that stuff hidden from your 
current user? You can try -a to rule that out. Or run as root.

--
#BlackLivesMatter

|| \\UTGERS , 
|---*O*---

||_// the State |         Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ 
RBHS Campus
||  \\    of NJ | Office of Advanced Research Computing - MSB 
A555B, Newark

     `'


On Nov 27, 2024, at 09:56, Kent L. Hanson via slurm-users
 wrote:
I am doing a new install of slurm 24.05.3 I have all the packages
built and installed on headnode and compute node with the same
munge.key, slurm.conf, and gres.conf file. I was able to run
munge and unmunge commands to test munge successfully. Time is
synced with chronyd. I can’t seem to find any useful errors in
the logs. For some reason when I run sinfo no nodes are listed. I
just see the headers for each column. Has anyone seen this or
know what a next step of troubleshooting would be? I’m new to
this and not sure where to go from here. Thanks for any and all help!
The odd output I am seeing
[username@headnode ~] sinfo
PARTITION AVAIL    TIMELIMIT NODES   STATE NODELIST
*/(Nothing is output showing status of partition or nodes)/*
Slurm.conf
ClusterName=slurmkvasir
SlurmctldHost=kadmin2
MpiDefault=none
ProctrackType=proctrack/cgroup
PrologFlags=contain
ReturnToService=2
SlurmctldPidFile=/var/run/slurm/slurmctld.pid
SlurmctldPort=6817
SlurmPidFile=/var/run/slurm/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
StateSaveLocation=/var/spool/slurmctld
TaskPlugin=task/cgroup
MinJobAge=600
SchedulerType=sched/backfill
SelectType=select/cons_tres
PriorityType=priority/multifactor
AccountingStorageHost=localhost
AccountingStoragePass=/var/run/munge/munge.socket.2
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageTRES=gres/gpu,cpu,node
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/cgroup
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=info
SlurmLogFile=/var/log/slurm/slurmd.log
nodeName=k[001-448]
PartitionName=default Nodes=k[001-448] Default=YES
MaxTime=INFINITE State=up
Slurmctld.log
Error: Configured MailProg is invalid
Slurmctld version 24.05.3 started on cluster slurmkvasir
Accounting_storage/slurmdbd: clusteracct_storage_p_register_ctld:
Regisetering slurmctld at port 8617
Error: read_slurm_conf: default partition not set.
Revovered state of 448 nodes
Down nodes: k[002-448]
Recovered information about 0 jobs
Revovered state of 0 reservations
Read_slurm_conf: backup_controller not specified
Select/cons_tres; select_p_reconfigure: select/cons_tres: reconfigure
Running as primary controller
Slurmd.log
Error: Node configuration differs from hardware: CPUS=1:40(hw)
Boards=1:1(hw) SocketsPerBoard=1:2(hw) CoresPerSocket=1:20(hw)
ThreadsPerCore:1:1(hw)
CPU frequency setting not configured for this node
Slurmd version 24.05.3started
Slurmd started on Wed, 27 Nov 2024 06:51:03 -0700
CPUS=1 Boards=1 Cores=1 Threads=1 Memory=192030 TmpDisk=95201
uptime 166740 CPUSpecList=(null) FeaturesAvail=(null)
FeaturesActive=(null)
Error: _/forward/_thread: failed to k019 (10.

[slurm-users] slurm releases

2025-04-01 Thread Patrick Begou via slurm-users

Hi slurm team,

I would ask some clarifications with slurm releases. Why two versions of 
slurm are available ?


I speak of 24.05.7 versus 24.11.3 on 
https://www.schedmd.com/slurm-support/release-announcements  and 
announces made on this list ?


I'm managing small clusters in a french public research lab and I've 
successfully updated (lets says re-installed as it was very old) an old 
one with slurm-24.05.3 some month ago.
Now I'm trying to deploy a second cluster with the same approach and 
I've selected 24.11.3-1 (but why not using 24.05.7 ?).


Moreover I have some difficulties for deploying 24.11.3 : I've built the 
rpm with:


   rpmbuild \
 --define '_prefix /usr' \
 --with pmix --with ucx --with mysql --with hwloc \
 -ta slurm-24.11.3.tar.bz2

but some packages wont install:

   $ dnf list --showduplicates slurm-slurmd
   Last metadata expiration check: 0:33:15 ago on Tue Apr  1 18:03:07 2025.
   Available Packages
   slurm-slurmd.x86_64
   24.11.3-1.el9  bb-local

   $ sudo dnf install slurm-slurmd
   Last metadata expiration check: 0:33:03 ago on Tue Apr  1 18:03:25 2025.
   Error:
 Problem: conflicting requests
  - nothing provides is needed by slurm-slurmd-24.11.3-1.el9.x86_64
   from bb-local
  - nothing provides installed needed by
   slurm-slurmd-24.11.3-1.el9.x86_64 from bb-local
  - nothing provides not needed by
   slurm-slurmd-24.11.3-1.el9.x86_64 from bb-local
   (try to add '--skip-broken' to skip uninstallable packages or
   '--nobest' to use not only best candidate packages)

I think I will try to build 24.05.7 or 24.05.3 as a next try but I'm 
interested in any advices.


Thank you

Patrick

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Setting QoS with slurm 24.05.7

2025-04-22 Thread Patrick Begou via slurm-users

Hi Michael,

thanks for your explanation. I understand that setting 
"MaxTRESMinsPerJob=cpu=172800"  will allow (in my case)


-  a job on the full cluster for 6h
-  a job on half of the cluster for 12 hours

But if I do not wont the same user to run at the same time 2 jobs on 
half of the cluster for 12 hours (and fill in the cluster for long time) 
how can I limit his running jobs at 172800 minutes*cpu ?
I was looking for something like "MaxTRESMinsPerUser" but do not find 
such a limitation resource.


Patrick



Le 18/04/2025 à 17:17, Michael Gutteridge a écrit :

Hi

I think you want one of the "MaxTRESMins*" options:

MaxTRESMins=TRES=[,TRES=,...]
MaxTRESMinsPJ=TRES=[,TRES=,...]
MaxTRESMinsPerJob=TRES=[,TRES=,...]
Maximum number of TRES minutes each job is able to use in this 
association. This is overridden if set directly on a user. Default is 
the cluster's limit. To clear a previously set value use the modify 
command with a new value of -1 for each TRES id.


   - sacctmgr(1)

The "MaxCPUs" is a limit on the number of CPUs the association can use.

 -- Michael


On Fri, Apr 18, 2025 at 8:01 AM Patrick Begou via slurm-users 
 wrote:


Hi all,

I'm trying to setup a QoS on a small 5 nodes cluster running slurm
24.05.7. My goal is to limit the resources on a (time x number of
cores)
strategy to avoid one large job requesting all the resources for too
long time. I've read from https://slurm.schedmd.com/qos.html and some
discussion but my setup is still not working.

I think I need to set these informations:
MaxCPUsPerJob=172800
MaxWallDurationPerJob=48:00:00
Flags=DenyOnLimit,OverPartQOS

for:
12h max for 240 cores => (12*240*60=172800mn)
no job can exceed 2 days
do not accept jobs out of these limits.

What I've done:

1) create the QoS:
sudo sacctmgr add qos workflowlimit \
  MaxWallDurationPerJob=48:00:00 \
  MaxCPUsPerJob=172800 \
  Flags=DenyOnLimit,OverPartQOS


2) Check
sacctmgr show qos Name=workflowlimit format=Name%16,MaxTRES,MaxWall
    Name   MaxTRES MaxWall
     - ---
   workflowlimit    cpu=172800  2-00:00:00

3) Set the QoS for the account "most" which is the default account
for
the users:
sudo sacctmgr modify account name=most set qos=workflowlimit

4) Check
$ sacctmgr show assoc format=account,cluster,user,qos
    Account    Cluster   User  QOS
-- -- -- 
   root osorno  normal
   root osorno   root   normal
   legi osorno  normal
   most osorno   workflowlimit
   most osorno  begou    workflowlimit

5) Modifiy slurm.conf with:
 AccountingStorageEnforce=limits,qos
and propagate on the 5 nodes and the front end (done via Ansible)

6) Check
clush -b -w osorno-fe,osorno,osorno-0-[0-4] 'grep
AccountingStorageEnforce /etc/slurm/slurm.conf'
---
osorno,osorno-0-[0-4],osorno-fe (7)
---
AccountingStorageEnforce=limits,qos

7) restart slurmd on all the compute nodes and slurmctld +
slurmdbd on
the management node.

But I can still request 400 cores for 24 hours:
[begou@osorno ~]$ srun -n 400 -t 24:0:0 --pty bash
bash-5.1$ squeue
   JOBID    PARTITION   NAME   USER ST TIME
START_TIME TIME_LIMIT CPUS NODELIST(REASON)
 147    genoa   bash  begou  R 0:03
2025-04-18T16:52:11 1-00:00:00  400 osorno-0-[0-4]

So I must have missed something ?

My partition (I've only one) in slurm.conf is:
PartitionName=genoa  State=UP Default=YES MaxTime=48:00:00
DefaultTime=24:00:00 Shared=YES OverSubscribe=NO Nodes=osorno-0-[0-4]

Thanks

Patrick


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com

To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Setting QoS with slurm 24.05.7

2025-04-18 Thread Patrick Begou via slurm-users

Hi all,

I'm trying to setup a QoS on a small 5 nodes cluster running slurm 
24.05.7. My goal is to limit the resources on a (time x number of cores) 
strategy to avoid one large job requesting all the resources for too 
long time. I've read from https://slurm.schedmd.com/qos.html and some 
discussion but my setup is still not working.


I think I need to set these informations:
MaxCPUsPerJob=172800
MaxWallDurationPerJob=48:00:00
Flags=DenyOnLimit,OverPartQOS

for:
12h max for 240 cores => (12*240*60=172800mn)
no job can exceed 2 days
do not accept jobs out of these limits.

What I've done:

1) create the QoS:
sudo sacctmgr add qos workflowlimit \
 MaxWallDurationPerJob=48:00:00 \
 MaxCPUsPerJob=172800 \
 Flags=DenyOnLimit,OverPartQOS


2) Check
sacctmgr show qos Name=workflowlimit format=Name%16,MaxTRES,MaxWall
   Name   MaxTRES MaxWall
    - ---
  workflowlimit    cpu=172800  2-00:00:00

3) Set the QoS for the account "most" which is the default account for 
the users:

sudo sacctmgr modify account name=most set qos=workflowlimit

4) Check
$ sacctmgr show assoc format=account,cluster,user,qos
   Account    Cluster   User  QOS
-- -- -- 
  root osorno  normal
  root osorno   root   normal
  legi osorno  normal
  most osorno   workflowlimit
  most osorno  begou    workflowlimit

5) Modifiy slurm.conf with:
    AccountingStorageEnforce=limits,qos
and propagate on the 5 nodes and the front end (done via Ansible)

6) Check
clush -b -w osorno-fe,osorno,osorno-0-[0-4] 'grep 
AccountingStorageEnforce /etc/slurm/slurm.conf'

---
osorno,osorno-0-[0-4],osorno-fe (7)
---
AccountingStorageEnforce=limits,qos

7) restart slurmd on all the compute nodes and slurmctld + slurmdbd on 
the management node.


But I can still request 400 cores for 24 hours:
[begou@osorno ~]$ srun -n 400 -t 24:0:0 --pty bash
bash-5.1$ squeue
  JOBID    PARTITION   NAME   USER ST TIME  
START_TIME TIME_LIMIT CPUS NODELIST(REASON)
    147    genoa   bash  begou  R 0:03 
2025-04-18T16:52:11 1-00:00:00  400 osorno-0-[0-4]


So I must have missed something ?

My partition (I've only one) in slurm.conf is:
PartitionName=genoa  State=UP Default=YES MaxTime=48:00:00 
DefaultTime=24:00:00 Shared=YES OverSubscribe=NO Nodes=osorno-0-[0-4]


Thanks

Patrick


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Setting QoS with slurm 24.05.7

2025-04-25 Thread Patrick Begou via slurm-users

Yes Michael!  With this setup it does the job.
There are so many tuning possibilities in Slurm I had missed this one.

Thank you very much.

Patrick

Le 22/04/2025 à 16:30, Michael Gutteridge a écrit :

WHoops, my mistake, sorry.  Is this closer to what you want:

MaxTRESRunMinsPU
MaxTRESRunMinsPerUser
Maximum number of TRES minutes each user is able to use. This takes 
into consideration the time limit of running jobs. If the limit is 
reached, no new jobs are started until other jobs finish to allow time 
to free up.


https://slurm.schedmd.com/sacctmgr.html#OPT_MaxTRESRunMinsPU

 - Michael

On Tue, Apr 22, 2025 at 1:35 AM Patrick Begou 
 wrote:


Hi Michael,

thanks for your explanation. I understand that setting
"MaxTRESMinsPerJob=cpu=172800"  will allow (in my case)

-  a job on the full cluster for 6h
-  a job on half of the cluster for 12 hours

But if I do not wont the same user to run at the same time 2 jobs
on half of the cluster for 12 hours (and fill in the cluster for
long time) how can I limit his running jobs at 172800 minutes*cpu ?
I was looking for something like "MaxTRESMinsPerUser" but do not
find such a limitation resource.

Patrick



Le 18/04/2025 à 17:17, Michael Gutteridge a écrit :

Hi

I think you want one of the "MaxTRESMins*" options:

MaxTRESMins=TRES=[,TRES=,...]
MaxTRESMinsPJ=TRES=[,TRES=,...]
MaxTRESMinsPerJob=TRES=[,TRES=,...]
Maximum number of TRES minutes each job is able to use in this
association. This is overridden if set directly on a user.
Default is the cluster's limit. To clear a previously set value
use the modify command with a new value of -1 for each TRES id.

 - sacctmgr(1)

The "MaxCPUs" is a limit on the number of CPUs the association
can use.

 -- Michael


On Fri, Apr 18, 2025 at 8:01 AM Patrick Begou via slurm-users
 wrote:

Hi all,

I'm trying to setup a QoS on a small 5 nodes cluster running
slurm
24.05.7. My goal is to limit the resources on a (time x
number of cores)
strategy to avoid one large job requesting all the resources
for too
long time. I've read from https://slurm.schedmd.com/qos.html
and some
discussion but my setup is still not working.

I think I need to set these informations:
MaxCPUsPerJob=172800
MaxWallDurationPerJob=48:00:00
Flags=DenyOnLimit,OverPartQOS

for:
12h max for 240 cores => (12*240*60=172800mn)
no job can exceed 2 days
do not accept jobs out of these limits.

What I've done:

1) create the QoS:
sudo sacctmgr add qos workflowlimit \
  MaxWallDurationPerJob=48:00:00 \
  MaxCPUsPerJob=172800 \
  Flags=DenyOnLimit,OverPartQOS


2) Check
sacctmgr show qos Name=workflowlimit
format=Name%16,MaxTRES,MaxWall
    Name   MaxTRES MaxWall
     - ---
   workflowlimit    cpu=172800  2-00:00:00

3) Set the QoS for the account "most" which is the default
account for
the users:
sudo sacctmgr modify account name=most set qos=workflowlimit

4) Check
$ sacctmgr show assoc format=account,cluster,user,qos
    Account    Cluster   User  QOS
-- -- -- 
   root osorno  normal
   root osorno   root   normal
   legi osorno  normal
   most osorno   workflowlimit
   most osorno  begou    workflowlimit

5) Modifiy slurm.conf with:
 AccountingStorageEnforce=limits,qos
and propagate on the 5 nodes and the front end (done via Ansible)

6) Check
clush -b -w osorno-fe,osorno,osorno-0-[0-4] 'grep
AccountingStorageEnforce /etc/slurm/slurm.conf'
---
osorno,osorno-0-[0-4],osorno-fe (7)
---
AccountingStorageEnforce=limits,qos

7) restart slurmd on all the compute nodes and slurmctld +
slurmdbd on
the management node.

But I can still request 400 cores for 24 hours:
[begou@osorno ~]$ srun -n 400 -t 24:0:0 --pty bash
bash-5.1$ squeue
   JOBID    PARTITION   NAME USER ST TIME
START_TIME TIME_LIMIT CPUS NODELIST(REASON)
 147    genoa   bash begou  R 0:03
2025-04-18T16:52:11 1-00:00:00  400 osorno-0-[0-4]

So I must have missed something ?

My partition (I've only one) in slurm.conf is:
PartitionName=genoa  State=UP Default=YES