Mahmood,

What do you see as the problem here? To me, there is no problem and the scheduler is working exactly has it should. The reason "Resources" means that there are not enough computing resources available for your job to run right now, so the job is setting in the queue in the pending state waiting for the necessary resources to become available. This is exactly what schedulers are

As Andreas pointed out, looking at the output of 'scontrol show node compute-0-0' that you provided, compute-0-0 has 32 cores and 63 GB of RAM. Out of that 9 cores and 55 GB of RAM have already been allocated, leaving 23 cores and only 8 GB of RAM available for other jobs. The job you submitted requested 20 cores (tasks, technically) and 40 GB of RAM. Since compute-0-0 doesn't have enough RAM available, Slurm is keeping your job in the queue until enough RAM is available for it to run. This is exactly what Slurm should be doing.

Prentice

On 4/17/19 11:00 AM, Henkel, Andreas wrote:
I think there isn’t enough memory.
AllocTres Shows mem=55G
And your job wants another 40G although the node only has 63G in total.
Best,
Andreas

Am 17.04.2019 um 16:45 schrieb Mahmood Naderan <mahmood...@gmail.com <mailto:mahmood...@gmail.com>>:

Hi,
Although it was fine for previous job runs, the following script now stuck as PD with the reason about resources.

$ cat slurm_script.sh
#!/bin/bash
#SBATCH --output=test.out
#SBATCH --job-name=g09-test
#SBATCH --ntasks=20
#SBATCH --nodelist=compute-0-0
#SBATCH --mem=40GB
#SBATCH --account=z7
#SBATCH --partition=EMERALD
g09 test.gjf
$ sbatch slurm_script.sh
Submitted batch job 878
$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)                878   EMERALD g09-test shakerza PD       0:00      1 (Resources)



However, all things look good.

$ sacctmgr list association format=user,account,partition,grptres%20 | grep shaker
shakerzad+      local
shakerzad+         z7    emerald cpu=20,mem=40G
$ scontrol show node compute-0-0
NodeName=compute-0-0 Arch=x86_64 CoresPerSocket=1
   CPUAlloc=9 CPUTot=32 CPULoad=8.89
   AvailableFeatures=rack-0,32CPUs
   ActiveFeatures=rack-0,32CPUs
   Gres=(null)
   NodeAddr=10.1.1.254 NodeHostName=compute-0-0 Version=18.08
   OS=Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017
   RealMemory=64261 AllocMem=56320 FreeMem=37715 Sockets=32 Boards=1
   State=MIXED ThreadsPerCore=1 TmpDisk=444124 Weight=20511900 Owner=N/A MCS_label=N/A
   Partitions=CLUSTER,WHEEL,EMERALD,QUARTZ
   BootTime=2019-04-06T10:03:47 SlurmdStartTime=2019-04-06T10:05:54
   CfgTRES=cpu=32,mem=64261M,billing=47
   AllocTRES=cpu=9,mem=55G
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s


Any idea?

Regards,
Mahmood


Reply via email to