Yes. It seems that what user specifies, slurm will reserve that. The other jobs realtime memory is less than what users had been specified. I thought that slurm will dynamically handles that in order to put more jobs in running state.
Regards, Mahmood On Wed, Apr 17, 2019 at 7:54 PM Prentice Bisbal <pbis...@pppl.gov> wrote: > Mahmood, > > What do you see as the problem here? To me, there is no problem and the > scheduler is working exactly has it should. The reason "Resources" means > that there are not enough computing resources available for your job to run > right now, so the job is setting in the queue in the pending state waiting > for the necessary resources to become available. This is exactly what > schedulers are > > As Andreas pointed out, looking at the output of 'scontrol show node > compute-0-0' that you provided, compute-0-0 has 32 cores and 63 GB of RAM. > Out of that 9 cores and 55 GB of RAM have already been allocated, leaving > 23 cores and only 8 GB of RAM available for other jobs. The job you > submitted requested 20 cores (tasks, technically) and 40 GB of RAM. Since > compute-0-0 doesn't have enough RAM available, Slurm is keeping your job in > the queue until enough RAM is available for it to run. This is exactly what > Slurm should be doing. > > Prentice > > On 4/17/19 11:00 AM, Henkel, Andreas wrote: > > I think there isn’t enough memory. > AllocTres Shows mem=55G > And your job wants another 40G although the node only has 63G in total. > Best, > Andreas > > Am 17.04.2019 um 16:45 schrieb Mahmood Naderan <mahmood...@gmail.com>: > > Hi, > Although it was fine for previous job runs, the following script now stuck > as PD with the reason about resources. > > $ cat slurm_script.sh > #!/bin/bash > #SBATCH --output=test.out > #SBATCH --job-name=g09-test > #SBATCH --ntasks=20 > #SBATCH --nodelist=compute-0-0 > #SBATCH --mem=40GB > #SBATCH --account=z7 > #SBATCH --partition=EMERALD > g09 test.gjf > $ sbatch slurm_script.sh > Submitted batch job 878 > $ squeue > JOBID PARTITION NAME USER ST TIME NODES > NODELIST(REASON) > 878 EMERALD g09-test shakerza PD 0:00 1 > (Resources) > > > > However, all things look good. > > $ sacctmgr list association format=user,account,partition,grptres%20 | > grep shaker > shakerzad+ local > shakerzad+ z7 emerald cpu=20,mem=40G > $ scontrol show node compute-0-0 > NodeName=compute-0-0 Arch=x86_64 CoresPerSocket=1 > CPUAlloc=9 CPUTot=32 CPULoad=8.89 > AvailableFeatures=rack-0,32CPUs > ActiveFeatures=rack-0,32CPUs > Gres=(null) > NodeAddr=10.1.1.254 NodeHostName=compute-0-0 Version=18.08 > OS=Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017 > RealMemory=64261 AllocMem=56320 FreeMem=37715 Sockets=32 Boards=1 > State=MIXED ThreadsPerCore=1 TmpDisk=444124 Weight=20511900 Owner=N/A > MCS_label=N/A > Partitions=CLUSTER,WHEEL,EMERALD,QUARTZ > BootTime=2019-04-06T10:03:47 SlurmdStartTime=2019-04-06T10:05:54 > CfgTRES=cpu=32,mem=64261M,billing=47 > AllocTRES=cpu=9,mem=55G > CapWatts=n/a > CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 > ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > > > Any idea? > > Regards, > Mahmood > > >