Remove this line:
#SBATCH --nodes=1
Slurm assumes you're requesting the whole node. --ntasks=1 should be
adequate.
On 11/7/19 4:19 PM, Mike Mosley wrote:
Greetings all:
I'm attempting to configure the scheduler to schedule our GPU boxes
but have run into a bit of a snag.
I have a box with two Tesla K80s. With my current configuration, the
scheduler will schedule one job on the box, but if I submit a second
job, it queues up until the first one finishes:
My submit script:
#SBATCH --partition=NodeSet1
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --gres=gpu:k80:1
My slurm.conf (the things I think are relevant)
GresTypes=gpu
SelectType=select/cons_tres
PartitionName=NodeSet1 Nodes=cht-c[1-4],cph-gpu1 Default=YES
MaxTime=INFINITE OverSubscribe=FORCE State=UP
NodeName=cph-gpu1 CPUs=16 Sockets=2 CoresPerSocket=8 ThreadsPerCore=1
RealMemory=257541 Gres=gpu:k80:2 Feature=gpu State=UNKNOWN
My gres.conf:
NodeName=cph-gpu1 Name=gpu Type=k80 File=/dev/nvidia[0-1]
and finally, the results of squeue:
$ squeue
JOBID PARTITION NAME USER ST TIMENODES NODELIST(REASON)
208NodeSet1 job.sh jmmosley PD 0:001 (Resources)
207NodeSet1 job.sh jmmosleyR 4:121 cph-gpu1
Any idea what I am missing or have misconfigured?
Thanks in advance.
Mike
--
*/J. Michael Mosley/*
University Research Computing
The University of North Carolina at Charlotte
9201 University City Blvd
Charlotte, NC 28223
_704.687.7065 _ _ j/mmos...@uncc.edu <mailto:mmos...@uncc.edu>/_