Hi Mike, IIRC if you have the default config, jobs get all the memory in the node, thus you can only run one job at a time. Check: root@admin:~# scontrol show config | grep DefMemPerNode DefMemPerNode = 64000
Regards, Alex On Thu, Nov 7, 2019 at 1:21 PM Mike Mosley <mike.mos...@uncc.edu> wrote: > Greetings all: > > I'm attempting to configure the scheduler to schedule our GPU boxes but > have run into a bit of a snag. > > I have a box with two Tesla K80s. With my current configuration, the > scheduler will schedule one job on the box, but if I submit a second job, > it queues up until the first one finishes: > > My submit script: > > #SBATCH --partition=NodeSet1 > > #SBATCH --nodes=1 > > #SBATCH --ntasks=1 > > #SBATCH --gres=gpu:k80:1 > > > My slurm.conf (the things I think are relevant) > > GresTypes=gpu > > SelectType=select/cons_tres > > > PartitionName=NodeSet1 Nodes=cht-c[1-4],cph-gpu1 Default=YES > MaxTime=INFINITE OverSubscribe=FORCE State=UP > > > NodeName=cph-gpu1 CPUs=16 Sockets=2 CoresPerSocket=8 ThreadsPerCore=1 > RealMemory=257541 Gres=gpu:k80:2 Feature=gpu State=UNKNOWN > > > > My gres.conf: > > NodeName=cph-gpu1 Name=gpu Type=k80 File=/dev/nvidia[0-1] > > > > and finally, the results of squeue: > > $ squeue > > JOBID PARTITION NAME USER ST TIME NODES > NODELIST(REASON) > > 208 NodeSet1 job.sh jmmosley PD 0:00 1 > (Resources) > > 207 NodeSet1 job.sh jmmosley R 4:12 1 > cph-gpu1 > > > Any idea what I am missing or have misconfigured? > > > > Thanks in advance. > > > Mike > > > -- > *J. Michael Mosley* > University Research Computing > The University of North Carolina at Charlotte > 9201 University City Blvd > Charlotte, NC 28223 > *704.687.7065 * * jmmos...@uncc.edu <mmos...@uncc.edu>* >