I could have missed a detail on my description, but we definitely don’t enable 
oversubscribe, or shared, or exclusiveuser. All three of those are set to “no” 
on all active queues.

Current subset of slurm.conf and squeue output:

=====

# egrep '^PartitionName=(gpu|any-interactive) ' /etc/slurm/slurm.conf
PartitionName=gpu Default=NO MinNodes=1 DefaultTime=1-00:00:00 
MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 
DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF 
ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL LLN=NO 
MaxCPUsPerNode=16 ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP 
TRESBillingWeights=CPU=3.00,Mem=1.024G,GRES/gpu=30.00 Nodes=gpunode[001-004]
PartitionName=any-interactive Default=NO MinNodes=1 MaxNodes=4 
DefaultTime=02:00:00 MaxTime=02:00:00 AllowGroups=ALL PriorityJobFactor=3 
PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 
PreemptMode=OFF ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL 
LLN=NO MaxCPUsPerNode=12 ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 
State=UP TRESBillingWeights=CPU=3.00,Mem=1.024G,GRES/gpu=30.00 
Nodes=node[001-040],gpunode[001-004]
# squeue -o "%6i %.15P %.10j %.5u %4C %5D %16R %6b" | grep gpunode002
778462             gpu CNN_GRU.sh miibr 1    1     gpunode002       gpu:1
778632 any-interactive       bash rnour 1    1     gpunode002       N/A

=====

From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Relu 
Patrascu <r...@cs.toronto.edu>
Reply-To: Slurm User Community List <slurm-users@lists.schedmd.com>
Date: Wednesday, September 30, 2020 at 4:02 PM
To: "slurm-users@lists.schedmd.com" <slurm-users@lists.schedmd.com>
Subject: Re: [slurm-users] Running gpu and cpu jobs on the same node

If you don't use OverSubscribe then resources are not shared. What resources a 
job gets allocated is not available to other jobs, regardless of partition.

Relu
On 2020-09-30 16:12, Ahmad Khalifa wrote:
I have a machine with 4 rtx2080ti and a core i9. I submit jobs to it through 
MPI PMI2 (from Relion).

If I use 5 MPI and 4 threads, then basically I'm using all 4 GPUs and 20 
threads of my cpu.

My question is, my current configuration allows submitting jobs to the same 
node, but with a different partition, but I'm not sure if I use #SBATCH 
--partition=cpu that the submitted jobs will only use the remaining 2 cores (4 
threads) or is it going to share resources with my gpu job?!

Thanks.


Reply via email to