Re: [slurm-users] CR_Core_Memory behavior

Christoph Brüning Wed, 26 Aug 2020 06:29:25 -0700

Hello Durai,

you did not specify the amount of memory in your node configuration.

Perhaps it defaults to 1MB and so your 1MB-job already uses all thememory that the scheduler thinks the node has...?

What does "scontrol show node slurm-gpu-1" say? Look for the"RealMemory" field in the output.


Best,
Christoph


On 26/08/2020 11.35, Durai Arasan wrote:

Hello,

this is my node configuration:

NodeName=slurm-gpu-1 NodeAddr=192.168.0.200 Procs=16 Gres=gpu:2State=UNKNOWNNodeName=slurm-gpu-2 NodeAddr=192.168.0.124 Procs=1 Gres=gpu:0State=UNKNOWNPartitionName=gpu Nodes=slurm-gpu-1 Default=NO MaxTime=INFINITEAllowAccounts=whitelist,gpu_users State=UPPartitionName=compute Nodes=slurm-gpu-1,slurm-gpu-2 Default=YESMaxTime=INFINITE AllowAccounts=whitelist State=UP

and this is one of the job scripts. You can see mem is set to 1M, sovery minimal.


#!/bin/bash
#SBATCH -J Test1
#SBATCH --nodelist=slurm-gpu-1
#SBATCH --mem=1M
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH -o /home/centos/Test1-%j.out
#SBATCH -e /home/centos/Test1-%j.err
srun sleep 60

Thanks,
Durai

On Wed, Aug 26, 2020 at 2:49 AM Jacqueline Scoggins <jscogg...@lbl.gov<mailto:jscogg...@lbl.gov>> wrote:

What is the variable for Oversubscribe is set for your partitions?
By default Oversubscribe=No which means that none of your Cores will
be shared with other jobs. With oversubscribe set to Yes or Force
you should set a number after the FORCE to allow the number of jobs
that can run on each core of each node in the partition.
Look at this page for a better understanding:

https://slurm.schedmd.com/cons_res_share.html#:~:text=OverSubscribe%3DYES-,By%20default%20same%20as%20OverSubscribe%3DNO.,the%20srun%20%2D%2Doversubscribe%20option.&text=Each%20core%20can%20be%20allocated,default%204%20jobs%20per%20core).&text=CPUs%20are%20allocated%20to%20jobs.

You can also check the oversubscribe on a partition using sinfo -o
"%h" option.
sinfo -o '%P %.5a %.10h %N ' | head

PARTITION AVAIL OVERSUBSCR NODELIST

Look at the sinfo options for further details.

Jackie

On Tue, Aug 25, 2020 at 9:58 AM Durai Arasan <arasan.du...@gmail.com
<mailto:arasan.du...@gmail.com>> wrote:

Hello,

On our cluster we have SelectTypeParameters set to "CR_Core_Memory".

Under these conditions multiple jobs should be able to run on
the same node. But they refuse to be allocated on the same node
and only one job runs on the node and rest of the jobs are in
pending state.

When we changed SelectTypeParameters to "CR_Core" however, this
issue was resolved and multiple jobs were successfully allocated
to the same node and ran concurrently on the same node.

Does anyone know why such behavior is seen? Why does including
memory as consumable resource lead to node exclusive behavior?

Thanks,
Durai


--
Dr. Christoph Brüning
Universität Würzburg
Rechenzentrum
Am Hubland
D-97074 Würzburg
Tel.: +49 931 31-80499

Re: [slurm-users] CR_Core_Memory behavior

Reply via email to