yes, the algorithm should be like that 1 cpu (core) per job(task).
Like someone mentioned already, need to to --oversubscribe=10 on cpu
cores, meaning 10 jobs on each core for you case. Slurm.conf.
Best,

Feng

On Fri, Jun 21, 2024 at 6:52 AM Arnuld via slurm-users
<slurm-users@lists.schedmd.com> wrote:
>
> > Every job will need at least 1 core just to run
> > and if there are only 4 cores on the machine,
> > one would expect a max of 4 jobs to run.
>
> I have 3500+ GPU cores available. You mean each GPU job requires at least one 
> CPU? Can't we run a job with just GPU without any CPUs? This sbatch script 
> requires 100 GPU cores, can;t we run 35 in parallel?
>
> #! /usr/bin/env bash
>
> #SBATCH --output="%j.out"
> #SBATCH --error="%j.error"
> #SBATCH --partition=pgpu
> #SBATCH --gres=shard:100
>
> sleep 10
> echo "Current date and time: $(date +"%Y-%m-%d %H:%M:%S")"
> echo "Running..."
> sleep 10
>
>
>
>
>
>
> On Thu, Jun 20, 2024 at 11:23 PM Brian Andrus via slurm-users 
> <slurm-users@lists.schedmd.com> wrote:
>>
>> Well, if I am reading this right, it makes sense.
>>
>> Every job will need at least 1 core just to run and if there are only 4
>> cores on the machine, one would expect a max of 4 jobs to run.
>>
>> Brian Andrus
>>
>> On 6/20/2024 5:24 AM, Arnuld via slurm-users wrote:
>> > I have a machine with a quad-core CPU and an Nvidia GPU with 3500+
>> > cores.  I want to run around 10 jobs in parallel on the GPU (mostly
>> > are CUDA based jobs).
>> >
>> > PROBLEM: Each job asks for only 100 shards (runs usually for a minute
>> > or so), then I should be able to run 3500/100 = 35 jobs in
>> > parallel but slurm runs only 4 jobs in parallel keeping the rest in
>> > the queue.
>> >
>> > I have this in slurm.conf and gres.conf:
>> >
>> > # GPU
>> > GresTypes=gpu,shard
>> > # COMPUTE NODES
>> > PartitionName=pzero Nodes=ALL Default=YES MaxTime=INFINITE State=UP`
>> > PartitionName=pgpu Nodes=hostgpu MaxTime=INFINITE State=UP
>> > NodeName=hostgpu NodeAddr=x.x.x.x Gres=gpu:gtx_1080_ti:1,shard:3500
>> > CPUs=4 Boards=1 SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=1
>> > RealMemory=64255 State=UNKNOWN
>> > ----------------------
>> > Name=gpu Type=gtx_1080_ti File=/dev/nvidia0 Count=1
>> > Name=shard Count=3500  File=/dev/nvidia0
>> >
>> >
>> >
>>
>> --
>> slurm-users mailing list -- slurm-users@lists.schedmd.com
>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to