Hi, Is lowercase #sbatch really valid?
> Am 14.11.2019 um 14:09 schrieb Sukman <[email protected]>: > > Hi Brian, > > thank you for the suggestion. > > It appears that my node is in drain state. > I rebooted the node and everything became fine. > > However, the QOS still cannot be applied properly. > Do you have any opinion regarding this issue? > > > $ sacctmgr show qos where Name=normal_compute > format=Name,Priority,MaxWal,MaxTRESPU > Name Priority MaxWall MaxTRESPU > ---------- ---------- ----------- ------------- > normal_co+ 10 00:01:00 cpu=2,mem=1G > > > when I run the following script: > > #!/bin/bash > #SBATCH --job-name=hostname > #sbatch --time=00:50 > #sbatch --mem=1M > #SBATCH --nodes=1 > #SBATCH --ntasks=1 > #SBATCH --ntasks-per-node=1 > #SBATCH --cpus-per-task=1 > #SBATCH --nodelist=cn110 > > srun hostname > > > It turns out that the QOSMaxMemoryPerUser has been met > > $ squeue > JOBID PARTITION NAME USER ST TIME NODES > NODELIST(REASON) > 88 defq hostname sukman PD 0:00 1 > (QOSMaxMemoryPerUser) > > > $ scontrol show job 88 > JobId=88 JobName=hostname > UserId=sukman(1000) GroupId=nobody(1000) MCS_label=N/A > Priority=4294901753 Nice=0 Account=user QOS=normal_compute > JobState=PENDING Reason=QOSMaxMemoryPerUser Dependency=(null) > Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 > RunTime=00:00:00 TimeLimit=00:01:00 TimeMin=N/A > SubmitTime=2019-11-14T19:49:37 EligibleTime=2019-11-14T19:49:37 > StartTime=Unknown EndTime=Unknown Deadline=N/A > PreemptTime=None SuspendTime=None SecsPreSuspend=0 > LastSchedEval=2019-11-14T19:55:50 > Partition=defq AllocNode:Sid=itbhn02:51072 > ReqNodeList=cn110 ExcNodeList=(null) > NodeList=(null) > NumNodes=1-1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* > TRES=cpu=1,node=1 > Socks/Node=* NtasksPerN:B:S:C=1:0:*:* CoreSpec=* > MinCPUsNode=1 MinMemoryNode=257758M MinTmpDiskNode=0 > Features=(null) DelayBoot=00:00:00 > Gres=(null) Reservation=(null) > OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) > Command=/home/sukman/script/test_hostname.sh > WorkDir=/home/sukman/script > StdErr=/home/sukman/script/slurm-88.out > StdIn=/dev/null > StdOut=/home/sukman/script/slurm-88.out > Power= > > > $ scontrol show node cn110 > NodeName=cn110 Arch=x86_64 CoresPerSocket=1 > CPUAlloc=0 CPUErr=0 CPUTot=56 CPULoad=0.01 > AvailableFeatures=(null) > ActiveFeatures=(null) > Gres=(null) > NodeAddr=cn110 NodeHostName=cn110 Version=17.11 > OS=Linux 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017 > RealMemory=257758 AllocMem=0 FreeMem=255742 Sockets=56 Boards=1 > State=IDLE ThreadsPerCore=1 TmpDisk=268629 Weight=1 Owner=N/A MCS_label=N/A > Partitions=defq > BootTime=2019-11-14T18:50:56 SlurmdStartTime=2019-11-14T18:53:23 > CfgTRES=cpu=56,mem=257758M,billing=56 > AllocTRES= > CapWatts=n/a > CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 > ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > > > --------------------------------------- > > Sukman > ITB Indonesia > > > > > ----- Original Message ----- > From: "Brian Andrus" <[email protected]> > To: [email protected] > Sent: Tuesday, November 12, 2019 10:41:42 AM > Subject: Re: [slurm-users] Limiting the number of CPU > > You are trying to specifically run on node cn110, so you may want to > check that out with sinfo > > A quick "sinfo -R" can list any down machines and the reasons. > > Brian Andrus > >
