Hi Amjad, AccountingStorageUser is the user used to connect to the accounting database. If you have it defined in slurm.conf, it is ignored.
>From the output you showed, it says the user cjr13geu in the cluster >uea_cluster has access to the QoS. How are you adding the QoS to other users? The way you would do it would be sacctmgr modify account <accountname> user=<username> set qos+=gpu-rtx-reserved or sacctmgr modify account <accountname> set qos+=gpu-rtx-reserved if you want to give it to every user in <accountname> Sean ________________________________ From: slurm-users <[email protected]> on behalf of Amjad Syed <[email protected]> Sent: Tuesday, 31 August 2021 17:46 To: Slurm User Community List <[email protected]> Subject: Re: [slurm-users] [EXT] User association with partition and Qos External email: Please exercise caution ________________________________ Hi Sean Here is the output for gpu-rtx-reserved qos sacctmgr show account withassoc -p | grep gpu-rtx-reserved default|default|default|uea_cluster||cjr13geu|1|||||||||||||||gpu,gpu-k40-1,gpu-rtx,gpu-rtx-reserved,hmem,ht,uea_def_qos| sontrol show part gpu-rtx6000-2 PartitionName=gpu-rtx6000-2 AllowGroups=ALL AllowAccounts=ALL AllowQos=gpu-rtx,gpu-rtx-reserved,jakeuea AllocNodes=ALL Default=NO QoS=N/A DefaultTime=1-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=9 MaxTime=7-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=g[15-29] PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO OverTimeLimit=NONE PreemptMode=GANG,SUSPEND State=UP TotalCPUs=720 TotalNodes=15 SelectTypeParameters=NONE JobDefaults=(null) DefMemPerCPU=3996 MaxMemPerNode=UNLIMITED On a different note we have the following in slurm.conf AccountingStorageUser=slurm But we have been adding qos and assigning users as root ? Can this be an issue Amjad On Tue, Aug 31, 2021 at 8:22 AM Sean Crosby <[email protected]<mailto:[email protected]>> wrote: What does sacctmgr show for the user you added to have access to the QoS, and what does Slurm show for the partition config? sacctmgr show account withassoc -p scontrol show part gpu-rtx6000-2 Sean ________________________________ From: slurm-users <[email protected]<mailto:[email protected]>> on behalf of Amjad Syed <[email protected]<mailto:[email protected]>> Sent: Tuesday, 31 August 2021 17:03 To: Slurm User Community List <[email protected]<mailto:[email protected]>> Subject: Re: [slurm-users] [EXT] User association with partition and Qos External email: Please exercise caution ________________________________ Hello me again Just found out that when our slurmctld restarts all qos are gone. I mean users who have association with the qos can not submit job with sbatch, they get error as sbatch: error: Batch job submission failed: Invalid qos specification Do we need to make anymore changes in slurm.conf so that qos becomes permanent ? Amjad On Fri, Aug 27, 2021 at 3:32 PM Amjad Syed <[email protected]<mailto:[email protected]>> wrote: Hi Sean, Thanks for the suggestion, seems to work now. Majid On Fri, Aug 27, 2021 at 12:56 PM Sean Crosby <[email protected]<mailto:[email protected]>> wrote: Hi Amjad, Make sure you have qos in the config entry AccountingStorageEnforce e.g. AccountingStorageEnforce=associations,limits,qos,safe Sean ________________________________ From: slurm-users <[email protected]<mailto:[email protected]>> on behalf of Amjad Syed <[email protected]<mailto:[email protected]>> Sent: Friday, 27 August 2021 20:28 To: [email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> Subject: [EXT] [slurm-users] User association with partition and Qos External email: Please exercise caution ________________________________ Hello all We are having an issue understanding user association and partition. Currently we have a partition with 30 GPU cards . We have defined a qos gpu-rtx that allows user to reserve 2 cards sacctmgr show qos gpu-rtx format=MaxTRESPU%60 MaxTRESPU ----------------------------------------------------- cpu=96,gres/gpu=2 We have defined a user test that is assoc with this qos sacctmgr show assoc user=test format=user,qos Qos gpu-rtx Now we define another qos gpu-rtx-reserved that allows gpu=8 sacctmgr show qos gpu-rtx-reserved format=MaxTRESPU%60 MaxTRESPU ----------------------------------------------------- cpu=192,gres/gpu=8 User test is not associated with gpu-rtx-reserved qos. So he should not be able to use more then gpu=2 . Both of these qos are now in slurm.conf for the partition parrtitionName=gpu-rtx6000-2 State=UP Nodes=g[15-29] MaxNodes=9 MaxTime=168:00:00 DefMemPerCPU=3996 AllowQos=gpu-rtx,gpu-rtx-reserved But we found out that even though user is not assoc with gpu-rtx-reserved if the user uses gpu-rtx-reserved in his slurm script , he can reserve 8 gpu cards So our question is , can the users assoc with one partition qos can use the other qos in the partition even if they are not associated with it . or in other words , we can only define one partition qos and not more then one.? Hope i was able to explain ? Any advice if we want partition to use more then one qos with different limits and users associated with one qos should not use other qos ? Majid
