Re: [slurm-users] [External] Scheduling GPUS

Prentice Bisbal Mon, 11 Nov 2019 10:21:26 -0800

Remove this line:

#SBATCH --nodes=1

Slurm assumes you're requesting the whole node. --ntasks=1 should beadequate.


On 11/7/19 4:19 PM, Mike Mosley wrote:

Greetings all:
I'm attempting to configure the scheduler to schedule our GPU boxesbut have run into a bit of a snag.
I have a box with two Tesla K80s. With my current configuration, thescheduler will schedule one job on the box, but if I submit a secondjob, it queues up until the first one finishes:
My submit script:

#SBATCH --partition=NodeSet1

#SBATCH --nodes=1

#SBATCH --ntasks=1

#SBATCH --gres=gpu:k80:1


My slurm.conf (the things I think are relevant)

GresTypes=gpu

SelectType=select/cons_tres
PartitionName=NodeSet1 Nodes=cht-c[1-4],cph-gpu1 Default=YESMaxTime=INFINITE OverSubscribe=FORCE State=UP
NodeName=cph-gpu1 CPUs=16 Sockets=2 CoresPerSocket=8 ThreadsPerCore=1RealMemory=257541 Gres=gpu:k80:2 Feature=gpu State=UNKNOWN
My gres.conf:

NodeName=cph-gpu1 Name=gpu Type=k80 File=/dev/nvidia[0-1]



and finally, the results of squeue:

$ squeue

JOBID PARTITION NAME USER ST TIMENODES NODELIST(REASON)

208NodeSet1 job.sh jmmosley PD 0:001 (Resources)

207NodeSet1 job.sh jmmosleyR 4:121 cph-gpu1


Any idea what I am missing or have misconfigured?



Thanks in advance.


Mike


--

*/J. Michael Mosley/*
University Research Computing
The University of North Carolina at Charlotte
9201 University City Blvd
Charlotte, NC  28223
_704.687.7065 _ _ j/mmos...@uncc.edu <mailto:mmos...@uncc.edu>/_

Re: [slurm-users] [External] Scheduling GPUS

Reply via email to