[slurm-users] ntasks and gres question

2022-04-06 Thread Chip Seraphine
Hello, In my cluster, every node has one instance of a gres called ‘io_nic’. The intent of it is to make it easier for users to ensure that jobs that perform excessive network I/O do not get scheduled simultaneously on the same machine. $ sinfo -N -o '%N %Gres' NODELIST GRESres chhq-supgcmp001

[slurm-users] Strange memory limit behavior with --mem-per-gpu

2022-04-06 Thread Paul Raines
I have a user who submitted an interactive srun job using: srun --mem-per-gpu 64 --gpus 1 --nodes 1 From sacct for this job we see: ReqTRES : billing=4,cpu=1,gres/gpu=1,mem=10G,node=1 AllocTRES : billing=4,cpu=1,gres/gpu=1,mem=64M,node=1 (where 10G I assume comes from th

Re: [slurm-users] Memory usage not tracked

2022-04-06 Thread Chin,David
Hi, Xand: How does adding "ReqMem" to the sacct change the output? E.g. on my cluster running Slurm 20.02.7 (on RHEL8), our GPU nodes have TRESBillingWeights=CPU=0,Mem=0,GRES/gpu=43: $ sacct --format=JobID%25,State,AllocTRES%50,ReqTRES,ReqMem,ReqCPUS|grep RUNNING JobID

Re: [slurm-users] Node is not allocating all CPUs

2022-04-06 Thread Guertin, David S.
slurm.conf contains the following: SelectType=select/cons_tres SelectTypeParameters=CR_Core AccountingStorageTRES=gres/gpu Could this be constraining CgfTRES=cpu=16 somehow? David Guertin From: Guertin, David S. Sent: Wednesday, April 6, 2022 12:27 PM To: Slurm

Re: [slurm-users] Node is not allocating all CPUs

2022-04-06 Thread Guertin, David S.
No, the user is submitting four jobs, each requesting 1/4 of the memory and 1/4 of the CPUs (i.e. 8 out of 32). But even though there are 32 physical cores, Slurm only shows 16 as trackable resources: >From scontrol show node node020: CfgTRES=cpu=16,mem=257600M,billing=16,gres/gpu=4 Why would

Re: [slurm-users] Node is not allocating all CPUs

2022-04-06 Thread Sarlo, Jeffrey S
Are the jobs getting assigned memory amounts that would only allow 16 processors to be used when the jobs are running on the node? Jeff From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of Guertin, David S. Sent: Wednesday, April 6, 2022 9:21 AM To: slurm-users@lists.sc

Re: [slurm-users] Node is not allocating all CPUs

2022-04-06 Thread Guertin, David S.
Thanks. That shows 32 cores, as expected: # /cm/shared/apps/slurm/19.05.8/sbin/slurmd -C NodeName=node020 CPUs=32 Boards=1 SocketsPerBoard=2 CoresPerSocket=16 ThreadsPerCore=1 RealMemory=257600 UpTime=0-22:39:36 But I can't understand why when users submit jobs, the node is only allocating 16.

Re: [slurm-users] Configuring SLURM on single node GPU cluster

2022-04-06 Thread Stephen Cousins
Hi Sushil, Try changing NodeName specification to: NodeName=localhost CPUs=96 State=UNKNOWN Gres=gpu*:8* Also: TaskPlugin=task/cgroup Best, Steve On Wed, Apr 6, 2022 at 9:56 AM Sushil Mishra wrote: > Dear SLURM users, > > I am very new to alarm and need some help in configuring slurm in

Re: [slurm-users] Configuring SLURM on single node GPU cluster

2022-04-06 Thread Kamil Wilczek
Hello, try to comment out the line: AutoDetect=nvml And then restart "slurmd" and "slurmctld". Job allocations to the same GPU might be an effect of automatic MPS configuration, thogugh I'm not sure for 100%: https://slurm.schedmd.com/gres.html#MPS_Management Kind Regards -- Kamil Wilczek

[slurm-users] Configuring SLURM on single node GPU cluster

2022-04-06 Thread Sushil Mishra
Dear SLURM users, I am very new to alarm and need some help in configuring slurm in a single node machine. This machine has 8x Nvidia GPUs and 96 core cpu. Vendor has set up a "LocalQ" but thai somehow is running all the calculations in GPU 0. If I submit 4 independent jobs at a time, it starts ru

Re: [slurm-users] [EXTERNAL] Re: Managing shared memory (/dev/shm) usage per job?

2022-04-06 Thread John Hanks
Thanks, Greg! This looks like the right way to do this. I will have to stop putting off learning to use spank plugins :) griznog On Wed, Apr 6, 2022 at 1:40 AM Greg Wickham wrote: > Hi John, Mark, > > > > We use a spank plugin > https://gitlab.com/greg.wickham/slurm-spank-private-tmpdir (this w