We are running slurm 20.11.2-1 from CentOS 7 rpms. The queue is set up to allow OverSubscribe:
NodeName=ne[04-09] CPUs=32 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 State=UNKNOWN PartitionName=neon-noSMT Nodes=ne[04-09] Default=NO MaxTime=3-00:00:00 DefaultTime=4:00:00 State=UP OverSubscribe=YES I requested a user submit the first job: #SBATCH --partition=neon-noSMT #SBATCH --job-name="ns072" #SBATCH --nodes=1 #SBATCH --ntasks=8 #SBATCH --time=24:00:00 #SBATCH --exclusive #SBATCH --error=ns072.err #SBATCH --output=ns072.out #SBATCH --mail-type=ALL # NONE, BEGIN, END, FAIL, REQUEUE, ALL #SBATCH --mail-user=u <mailto:mail-user=tgjenk...@txcorp.com>s...@corp.com I requested the user submit the second job using the same SBATCH commands as above, but adding: #SBATCH —-oversubscribe and the command to run the second job on the same node as the first job: sbatch —nodelist={node running first job} run.sbatch Note each job only uses 8 ntasks/cores, out of 32 available. When he submits the second job, the first job slows down to 300x slower. If I login to the node running the 2 jobs, only the top 8 cores/ntasks are being used, not 8 for each job. These are the SCHEDULING parameters from /etc/slurm/slurm.conf: # SCHEDULING # out 29Dec20 #FastSchedule=1 SchedulerType=sched/backfill SelectType=select/linear SelectTypeParameters=CR_ONE_TASK_PER_CORE Is there a different parameter I should be looking at? Thanks in advance, Anne Hammond