I could have missed a detail on my description, but we definitely don’t enable oversubscribe, or shared, or exclusiveuser. All three of those are set to “no” on all active queues.
Current subset of slurm.conf and squeue output: ===== # egrep '^PartitionName=(gpu|any-interactive) ' /etc/slurm/slurm.conf PartitionName=gpu Default=NO MinNodes=1 DefaultTime=1-00:00:00 MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL LLN=NO MaxCPUsPerNode=16 ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP TRESBillingWeights=CPU=3.00,Mem=1.024G,GRES/gpu=30.00 Nodes=gpunode[001-004] PartitionName=any-interactive Default=NO MinNodes=1 MaxNodes=4 DefaultTime=02:00:00 MaxTime=02:00:00 AllowGroups=ALL PriorityJobFactor=3 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL LLN=NO MaxCPUsPerNode=12 ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP TRESBillingWeights=CPU=3.00,Mem=1.024G,GRES/gpu=30.00 Nodes=node[001-040],gpunode[001-004] # squeue -o "%6i %.15P %.10j %.5u %4C %5D %16R %6b" | grep gpunode002 778462 gpu CNN_GRU.sh miibr 1 1 gpunode002 gpu:1 778632 any-interactive bash rnour 1 1 gpunode002 N/A ===== From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Relu Patrascu <r...@cs.toronto.edu> Reply-To: Slurm User Community List <slurm-users@lists.schedmd.com> Date: Wednesday, September 30, 2020 at 4:02 PM To: "slurm-users@lists.schedmd.com" <slurm-users@lists.schedmd.com> Subject: Re: [slurm-users] Running gpu and cpu jobs on the same node If you don't use OverSubscribe then resources are not shared. What resources a job gets allocated is not available to other jobs, regardless of partition. Relu On 2020-09-30 16:12, Ahmad Khalifa wrote: I have a machine with 4 rtx2080ti and a core i9. I submit jobs to it through MPI PMI2 (from Relion). If I use 5 MPI and 4 threads, then basically I'm using all 4 GPUs and 20 threads of my cpu. My question is, my current configuration allows submitting jobs to the same node, but with a different partition, but I'm not sure if I use #SBATCH --partition=cpu that the submitted jobs will only use the remaining 2 cores (4 threads) or is it going to share resources with my gpu job?! Thanks.