Remove this line:

#SBATCH --nodes=1

Slurm assumes you're requesting the whole node. --ntasks=1 should be adequate.

On 11/7/19 4:19 PM, Mike Mosley wrote:
Greetings all:

I'm attempting to  configure the scheduler to schedule our GPU boxes but have run into a bit of a snag.

I have a box with two Tesla K80s.  With my current configuration, the scheduler will schedule one job on the box, but if I submit a second job, it queues up until the first one finishes:

My submit script:

#SBATCH --partition=NodeSet1

#SBATCH --nodes=1

#SBATCH --ntasks=1

#SBATCH --gres=gpu:k80:1


My slurm.conf (the things I think are relevant)

GresTypes=gpu

SelectType=select/cons_tres


PartitionName=NodeSet1 Nodes=cht-c[1-4],cph-gpu1 Default=YES MaxTime=INFINITE OverSubscribe=FORCE State=UP


NodeName=cph-gpu1 CPUs=16 Sockets=2 CoresPerSocket=8 ThreadsPerCore=1 RealMemory=257541 Gres=gpu:k80:2 Feature=gpu State=UNKNOWN



My gres.conf:

NodeName=cph-gpu1 Name=gpu Type=k80 File=/dev/nvidia[0-1]



and finally, the results of squeue:

$ squeue

JOBID PARTITION NAME USER ST TIMENODES NODELIST(REASON)

208NodeSet1 job.sh jmmosley PD 0:001 (Resources)

207NodeSet1 job.sh jmmosleyR 4:121 cph-gpu1


Any idea what I am missing or have misconfigured?



Thanks in advance.


Mike


--

*/J. Michael Mosley/*
University Research Computing
The University of North Carolina at Charlotte
9201 University City Blvd
Charlotte, NC  28223
_704.687.7065 _ _ j/mmos...@uncc.edu <mailto:mmos...@uncc.edu>/_

Reply via email to