Re: [slurm-users] Scheduling GPUS

Alex Chekholko Thu, 07 Nov 2019 13:58:29 -0800

Hi Mike,

IIRC if you have the default config, jobs get all the memory in the node,
thus you can only run one job at a time. Check:
root@admin:~# scontrol show config | grep DefMemPerNode
DefMemPerNode           = 64000


Regards,
Alex

On Thu, Nov 7, 2019 at 1:21 PM Mike Mosley <mike.mos...@uncc.edu> wrote:

> Greetings all:
>
> I'm attempting to  configure the scheduler to schedule our GPU boxes but
> have run into a bit of a snag.
>
> I have a box with two Tesla K80s.  With my current configuration, the
> scheduler will schedule one job on the box, but if I submit a second job,
> it queues up until the first one finishes:
>
> My submit script:
>
> #SBATCH --partition=NodeSet1
>
> #SBATCH --nodes=1
>
> #SBATCH --ntasks=1
>
> #SBATCH --gres=gpu:k80:1
>
>
> My slurm.conf (the things I think are relevant)
>
> GresTypes=gpu
>
> SelectType=select/cons_tres
>
>
> PartitionName=NodeSet1 Nodes=cht-c[1-4],cph-gpu1 Default=YES
> MaxTime=INFINITE OverSubscribe=FORCE State=UP
>
>
> NodeName=cph-gpu1 CPUs=16 Sockets=2 CoresPerSocket=8 ThreadsPerCore=1
> RealMemory=257541 Gres=gpu:k80:2 Feature=gpu State=UNKNOWN
>
>
>
> My gres.conf:
>
> NodeName=cph-gpu1 Name=gpu Type=k80 File=/dev/nvidia[0-1]
>
>
>
> and finally, the results of squeue:
>
> $ squeue
>
>              JOBID PARTITION     NAME     USER ST       TIME  NODES
> NODELIST(REASON)
>
>                208  NodeSet1   job.sh jmmosley PD       0:00      1
> (Resources)
>
>                207  NodeSet1   job.sh jmmosley  R       4:12      1
> cph-gpu1
>
>
> Any idea what I am missing or have misconfigured?
>
>
>
> Thanks in advance.
>
>
> Mike
>
>
> --
> *J. Michael Mosley*
> University Research Computing
> The University of North Carolina at Charlotte
> 9201 University City Blvd
> Charlotte, NC  28223
> *704.687.7065 *    * jmmos...@uncc.edu <mmos...@uncc.edu>*
>

Re: [slurm-users] Scheduling GPUS

Reply via email to