Re: [slurm-users] [External] Scheduling GPUS

Loris Bennett Mon, 11 Nov 2019 23:33:52 -0800

Hi,

I don't think the statement below about --nodes=1 is true.  It just
means you want one and not more than one node.  This can be important
multiple cores are requested, but the program is not, say, an MPI
program.


You can see which cores a running job is using with

  scontrol show job --detail <job id>

HTH

Loris

Prentice Bisbal <pbis...@pppl.gov> writes:

> Remove this line: 
>
> #SBATCH --nodes=1
>
> Slurm assumes you're requesting the whole node. --ntasks=1 should be 
> adequate. 
>
>
> On 11/7/19 4:19 PM, Mike Mosley wrote:
>
>  Greetings all:
>
>  I'm attempting to  configure the scheduler to schedule our GPU boxes but 
> have run into a bit of a snag. 
>
>  I have a box with two Tesla K80s.  With my current configuration, the 
> scheduler will schedule one job on the box, but if I submit a second job, it 
> queues up until the first
>  one finishes:
>
>  My submit script:
>
>  #SBATCH --partition=NodeSet1
>
>  #SBATCH --nodes=1
>
>  #SBATCH --ntasks=1
>
>  #SBATCH --gres=gpu:k80:1
>
>  My slurm.conf (the things I think are relevant)
>
>  GresTypes=gpu
>
>  SelectType=select/cons_tres
>
>  PartitionName=NodeSet1 Nodes=cht-c[1-4],cph-gpu1 Default=YES 
> MaxTime=INFINITE OverSubscribe=FORCE State=UP
>
>  NodeName=cph-gpu1 CPUs=16 Sockets=2 CoresPerSocket=8 ThreadsPerCore=1 
> RealMemory=257541 Gres=gpu:k80:2 Feature=gpu State=UNKNOWN
>
>  My gres.conf:
>
>  NodeName=cph-gpu1 Name=gpu Type=k80 File=/dev/nvidia[0-1]
>
>  and finally, the results of squeue:
>
>  $ squeue
>
>               JOBID PARTITION     NAME     USER ST       TIME  NODES 
> NODELIST(REASON)
>
>                 208  NodeSet1   job.sh jmmosley PD       0:00      1 
> (Resources)
>
>                 207  NodeSet1   job.sh jmmosley  R       4:12      1 cph-gpu1
>
>  Any idea what I am missing or have misconfigured?
>
>  Thanks in advance.
>
>  Mike
>
>  -- 
>
>  J. Michael Mosley
>  University Research Computing
>  The University of North Carolina at Charlotte
>  9201 University City Blvd
>  Charlotte, NC  28223
>  704.687.7065      jmmos...@uncc.edu
>
-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin         Email loris.benn...@fu-berlin.de

Re: [slurm-users] [External] Scheduling GPUS

Reply via email to