yes, the algorithm should be like that 1 cpu (core) per job(task). Like someone mentioned already, need to to --oversubscribe=10 on cpu cores, meaning 10 jobs on each core for you case. Slurm.conf. Best,
Feng On Fri, Jun 21, 2024 at 6:52 AM Arnuld via slurm-users <slurm-users@lists.schedmd.com> wrote: > > > Every job will need at least 1 core just to run > > and if there are only 4 cores on the machine, > > one would expect a max of 4 jobs to run. > > I have 3500+ GPU cores available. You mean each GPU job requires at least one > CPU? Can't we run a job with just GPU without any CPUs? This sbatch script > requires 100 GPU cores, can;t we run 35 in parallel? > > #! /usr/bin/env bash > > #SBATCH --output="%j.out" > #SBATCH --error="%j.error" > #SBATCH --partition=pgpu > #SBATCH --gres=shard:100 > > sleep 10 > echo "Current date and time: $(date +"%Y-%m-%d %H:%M:%S")" > echo "Running..." > sleep 10 > > > > > > > On Thu, Jun 20, 2024 at 11:23 PM Brian Andrus via slurm-users > <slurm-users@lists.schedmd.com> wrote: >> >> Well, if I am reading this right, it makes sense. >> >> Every job will need at least 1 core just to run and if there are only 4 >> cores on the machine, one would expect a max of 4 jobs to run. >> >> Brian Andrus >> >> On 6/20/2024 5:24 AM, Arnuld via slurm-users wrote: >> > I have a machine with a quad-core CPU and an Nvidia GPU with 3500+ >> > cores. I want to run around 10 jobs in parallel on the GPU (mostly >> > are CUDA based jobs). >> > >> > PROBLEM: Each job asks for only 100 shards (runs usually for a minute >> > or so), then I should be able to run 3500/100 = 35 jobs in >> > parallel but slurm runs only 4 jobs in parallel keeping the rest in >> > the queue. >> > >> > I have this in slurm.conf and gres.conf: >> > >> > # GPU >> > GresTypes=gpu,shard >> > # COMPUTE NODES >> > PartitionName=pzero Nodes=ALL Default=YES MaxTime=INFINITE State=UP` >> > PartitionName=pgpu Nodes=hostgpu MaxTime=INFINITE State=UP >> > NodeName=hostgpu NodeAddr=x.x.x.x Gres=gpu:gtx_1080_ti:1,shard:3500 >> > CPUs=4 Boards=1 SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=1 >> > RealMemory=64255 State=UNKNOWN >> > ---------------------- >> > Name=gpu Type=gtx_1080_ti File=/dev/nvidia0 Count=1 >> > Name=shard Count=3500 File=/dev/nvidia0 >> > >> > >> > >> >> -- >> slurm-users mailing list -- slurm-users@lists.schedmd.com >> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com > > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com