Hi Sean Here is the output for gpu-rtx-reserved qos
sacctmgr show account withassoc -p | grep gpu-rtx-reserved default|default|default|uea_cluster||cjr13geu|1|||||||||||||||gpu,gpu-k40-1,gpu-rtx, *gpu-rtx-reserved*,hmem,ht,uea_def_qos| sontrol show part gpu-rtx6000-2 PartitionName=gpu-rtx6000-2 AllowGroups=ALL AllowAccounts=ALL AllowQos=gpu-rtx,gpu-rtx-reserved,jakeuea AllocNodes=ALL Default=NO QoS=N/A DefaultTime=1-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=9 MaxTime=7-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=g[15-29] PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO OverTimeLimit=NONE PreemptMode=GANG,SUSPEND State=UP TotalCPUs=720 TotalNodes=15 SelectTypeParameters=NONE JobDefaults=(null) DefMemPerCPU=3996 MaxMemPerNode=UNLIMITED On a different note we have the following in slurm.conf AccountingStorageUser=slurm But we have been adding qos and assigning users as root ? Can this be an issue Amjad On Tue, Aug 31, 2021 at 8:22 AM Sean Crosby <scro...@unimelb.edu.au> wrote: > What does sacctmgr show for the user you added to have access to the QoS, > and what does Slurm show for the partition config? > > sacctmgr show account withassoc -p > scontrol show part gpu-rtx6000-2 > > Sean > ------------------------------ > *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of > Amjad Syed <amjad...@gmail.com> > *Sent:* Tuesday, 31 August 2021 17:03 > *To:* Slurm User Community List <slurm-users@lists.schedmd.com> > *Subject:* Re: [slurm-users] [EXT] User association with partition and Qos > > * External email: Please exercise caution * > ------------------------------ > Hello me again > > Just found out that when our slurmctld restarts all qos are gone. > > I mean users who have association with the qos can not submit job with > sbatch, they get error as > > sbatch: error: Batch job submission failed: Invalid qos specification > > > Do we need to make anymore changes in slurm.conf so that qos becomes > permanent ? > > Amjad > > On Fri, Aug 27, 2021 at 3:32 PM Amjad Syed <amjad...@gmail.com> wrote: > > Hi Sean, > > Thanks for the suggestion, seems to work now. > > Majid > > On Fri, Aug 27, 2021 at 12:56 PM Sean Crosby <scro...@unimelb.edu.au> > wrote: > > Hi Amjad, > > Make sure you have qos in the config entry AccountingStorageEnforce > > e.g. > > AccountingStorageEnforce=associations,limits,qos,safe > > Sean > > ------------------------------ > *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of > Amjad Syed <amjad...@gmail.com> > *Sent:* Friday, 27 August 2021 20:28 > *To:* slurm-us...@schedmd.com <slurm-us...@schedmd.com> > *Subject:* [EXT] [slurm-users] User association with partition and Qos > > * External email: Please exercise caution * > ------------------------------ > Hello all > > We are having an issue understanding user association and partition. > > Currently we have a partition with 30 GPU cards . > > We have defined a qos gpu-rtx that allows user to reserve 2 cards > > sacctmgr show qos gpu-rtx format=MaxTRESPU%60 > > MaxTRESPU > > ----------------------------------------------------- > cpu=96,gres/gpu=2 > > > > > We have defined a user test that is assoc with this qos > > > sacctmgr show assoc user=test format=user,qos > > > Qos > > gpu-rtx > > > > Now we define another qos gpu-rtx-reserved that allows gpu=8 > > > sacctmgr show qos gpu-rtx-reserved format=MaxTRESPU%60 > > MaxTRESPU > > ----------------------------------------------------- > cpu=192,gres/gpu=8 > > User test is not associated with gpu-rtx-reserved qos. So he should not be > able to use more then gpu=2 . > Both of these qos are now in slurm.conf for the partition > > parrtitionName=gpu-rtx6000-2 State=UP Nodes=g[15-29] MaxNodes=9 > MaxTime=168:00:00 DefMemPerCPU=3996 AllowQos=gpu-rtx,gpu-rtx-reserved > > > > But we found out that even though user is not assoc with gpu-rtx-reserved > if the user uses gpu-rtx-reserved in his slurm script , he can reserve 8 > gpu cards > > > So our question is , can the users assoc with one partition qos can use > the other qos in the partition even if they are not associated with it . > or in other words , we can only define one partition qos and not more then > one.? > > > Hope i was able to explain ? > > > Any advice if we want partition to use more then one qos with different > limits and users associated with one qos should not use other qos ? > > > Majid > > > > >