Excuse me... I see the output of squeue which says 170 IACTIVE bash mahmood PD 0:00 1 (AssocGrpMemLimit)
I don't understand why the memory limit is reach? I can not see the memory usage of a running job from sacct commands. However, using "top" on the compute node, I see 6 cores each uses 400MB. So it is below 8G which defined for the user. Regards, Mahmood On Fri, May 11, 2018 at 4:20 PM, Mahmood Naderan <mahmood...@gmail.com> wrote: > Hi > I have added a user to multiple partitions. That account name actually > corresponds to a set of limitations which I define for a user. > > [root@rocks7 ~]# sacctmgr list association > format=partition,account,user,grptres,maxwall > Partition Account User GrpTRES MaxWall > ---------- ---------- ---------- ------------- ----------- > root > root root > em1 > iactive em1 mahmood cpu=6,mem=8G 30-00:00:00 > plan1 em1 mahmood cpu=6,mem=8G 30-00:00:00 > monthly > plan2 monthly mahmood cpu=32,mem=6+ 30-00:00:00 > [root@rocks7 ~]# squeue -j 167 > JOBID PARTITION NAME USER ST TIME NODES > NODELIST(REASON) > 167 PLAN1 test mahmood R 5:58:41 1 compute-0-3 > [root@rocks7 ~]# squeue -j 167 -o %C > CPUS > 6 > > > As you see the user is running a job with the maximum core counts > allowed. Now, if I run > > [mahmood@rocks7 Downloads]$ salloc -p IACTIVE -A em1 > salloc: Pending job allocation 170 > salloc: job 170 queued and waiting for resources > > Which is pending for resources. I want to be sure that the pending is > REALLY related to reaching the maximum tres limits and NOT a > configuration problem. > > Is that OK? Hope that I asked my question correctly ;) > > > Regards, > Mahmood