Re: [slurm-users] Configuring SLURM on single node GPU cluster

2022-04-06 Thread Stephen Cousins
Hi Sushil, Try changing NodeName specification to: NodeName=localhost CPUs=96 State=UNKNOWN Gres=gpu*:8* Also: TaskPlugin=task/cgroup Best, Steve On Wed, Apr 6, 2022 at 9:56 AM Sushil Mishra wrote: > Dear SLURM users, > > I am very new to alarm and need some help in configuring slurm in

Re: [slurm-users] Strange behaviour with dynamically linked binary in batch job

2022-03-31 Thread Stephen Cousins
Is using "#!/bin/bash -l" enough to make it work? On Thu, Mar 31, 2022 at 6:46 AM Sebastian Potthoff < s.potth...@uni-muenster.de> wrote: > Just a quick follow up, that I could resolve the issue. Maybe this helps > someone in the future. > > $BASH_ENV was pointing to a deprecated script, resetti

Re: [slurm-users] Question about sbatch options: -n, and --cpus-per-task

2022-03-24 Thread Stephen Cousins
If you want to have the same number of processes per node, like: #PBS -l nodes=4:ppn=8 then what I am doing (maybe there is another way?) is: #SBATCH --ntasks-per-node=8 #SBATCH --nodes=4 #SBATCH --mincpus=8 This is because "--ntasks-per-node" is actually "maximum number of tasks per node" and

[slurm-users] Possible to have a node in two partitions with N cores in one partition and M cores in the other?

2022-02-11 Thread Stephen Cousins
I see at https://slurm.schedmd.com/cons_res_share.html that there are some ways to share a node between partitions but I don't see how to specify a set number of cores to each partition. Is this possible? If I have some nodes with 36 cores, is there a way to make 16 of them be in one partition and

Re: [slurm-users] sbatch - accept jobs above limits

2022-02-08 Thread Stephen Cousins
PM Stephen Cousins wrote: > What I'm saying is that the job might not be able to run in that > partition. Ever. The job might be asking for more resources than the > partition can provide. Maybe I'm wrong but it would help to know what the > partition definition is, along w

Re: [slurm-users] sbatch - accept jobs above limits

2022-02-08 Thread Stephen Cousins
ition have specified (both of these in slurm.conf) and then what the job is asking for. On Tue, Feb 8, 2022, 7:36 PM wrote: > Yes, the partition does not meet the requirements now. > > The job should still be submitted and wait until requirements are > available. > > > On 09.02.

Re: [slurm-users] sbatch - accept jobs above limits

2022-02-08 Thread Stephen Cousins
I think this message comes up when there are no nodes in that partition have the resources capable to meet the requirements. Can you show what the partition definition is in slurm.conf along with what the job is asking for? On Tue, Feb 8, 2022, 5:25 PM wrote: > > Dear all, > > sbatch jobs are im

Re: [slurm-users] Compute nodes cycling from idle to down on a regular basis ?

2022-02-02 Thread Stephen Cousins
Hi Jeremy, What is the value of TreeWidth in your slurm.conf? If there is no entry then I recommend setting it to a value a bit larger than the number of nodes you have in your cluster and then restarting slurmctld. Best, Steve On Wed, Feb 2, 2022 at 12:59 AM Jeremy Fix wrote: > Hi, > > A fol

Re: [slurm-users] Running 2 jobs on one node uses the same cores, 300x slowdown

2021-11-23 Thread Stephen Cousins
I'd take a look at: https://slurm.schedmd.com/cpu_management.html#Example2 I think this might be what you want: SelectType=select/cons_res SelectTypeParameters=CR_Core Best, Steve On Tue, Nov 23, 2021, 7:35 PM Anne Hammond wrote: > We are running slurm 20.11.2-1 from CentOS 7 rpms. > > Th

Re: [slurm-users] Unable to start slurmd service

2021-11-16 Thread Stephen Cousins
*From: *slurm-users on behalf of > Stephen Cousins > *Reply to: *Slurm User Community List > *Date: *Tuesday, 16 November 2021 at 19:09 > *To: *Slurm User Community List > *Subject: *Re: [slurm-users] Unable to start slurmd service > > > > I think you just need to use sco

Re: [slurm-users] Unable to start slurmd service

2021-11-16 Thread Stephen Cousins
I think you just need to use scontrol to "resume" that node. On Tue, Nov 16, 2021, 10:10 AM Jaep Emmanuel wrote: > Hi, > > > > It might be a newbie question since I'm new to slurm. > > I'm trying to restart the slurmd service on one of our Ubuntu box. > > > > The slurmd.service is defined by: >