Re: [slurm-users] Requirement to run longer jobs

2019-07-03 Thread Loris Bennett
Hi Chris, Chris Samuel writes: > On 3/7/19 8:49 am, David Baker wrote: > >> Does the above make sense or is it too complicated? > > [looks at our 14 partitions and 112 QOS's] > > Nope, that seems pretty simple. We do much the same here. Out of interest, how many partitions and QOSs would an av

Re: [slurm-users] dual slurmctld and slurmdbd

2019-07-03 Thread Brian Andrus
Your welcome :) If you aren't pleased with the timeouts, you may want to look at the SlurmctldTimeout in slurm.conf: SlurmctldTimeout The interval, in seconds, that the backup controller waits for the primary controller to respond before assuming control. The default value is 120 seconds. Ma

Re: [slurm-users] dual slurmctld and slurmdbd

2019-07-03 Thread Tina Fora
Thanks Brian Andrus and Chris Samuel. I was able to get it to work on our dev setup as primary/backup. Already had the shared state directory. If I take primary down it takes about two minutes for slurm commands to work again as the backup takes over. When I bring the primary back up it is a bit f

Re: [slurm-users] Requirement to run longer jobs

2019-07-03 Thread Chris Samuel
On 3/7/19 8:49 am, David Baker wrote: Does the above make sense or is it too complicated? [looks at our 14 partitions and 112 QOS's] Nope, that seems pretty simple. We do much the same here. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Requirement to run longer jobs

2019-07-03 Thread Thomas M. Payerle
The dual QoSes (or dual partition solution suggested by someone else) should both work in allow select users to submit jobs with longer run times. We use something like that on our cluster (though I confess it was our first Slurm cluster and we might have overdid it with QoSes causing scheduler to

Re: [slurm-users] Requirement to run longer jobs

2019-07-03 Thread Andy Georges
Hi, On Wed, Jul 03, 2019 at 03:49:44PM +, David Baker wrote: > Hello, > > > A few of our users have asked about running longer jobs on our cluster. > Currently our main/default compute partition has a time limit of 2.5 days. > Potentially, a handful of users need jobs to run up to 5 hours. R

Re: [slurm-users] SLURM_NTASKS values in interactive and batch jobs

2019-07-03 Thread Chris Samuel
On 3/7/19 8:17 am, Lech Nieroda wrote: Is that the expected behaviour or a bug? I'm not seeing that here with 18.08.7 and salloc, I'm only seeing: SLURM_NTASKS=5 that's both with our default salloc command that pushes users out to the compute node they've been allocated or specifying /bin/b

[slurm-users] Requirement to run longer jobs

2019-07-03 Thread David Baker
Hello, A few of our users have asked about running longer jobs on our cluster. Currently our main/default compute partition has a time limit of 2.5 days. Potentially, a handful of users need jobs to run up to 5 hours. Rather than allow all users/jobs to have a run time limit of 5 days I wonder

[slurm-users] SLURM_NTASKS values in interactive and batch jobs

2019-07-03 Thread Lech Nieroda
Hi all, there seems to be a discrepancy in the SLURM_NTASKS values depending on the job type. For example, let’s say the job requests 5 tasks (-n 5), is submitted with sbatch, then its job step uses only 1 task (e.g. srun -n 1). In that case you’ll see following values (with every launcher):

Re: [slurm-users] dual slurmctld and slurmdbd

2019-07-03 Thread Chris Samuel
On 2/7/19 1:48 pm, Tina Fora wrote: We run mysql on a dedicated machine with slurmctld and slurmdbd running on another machine. Now I want to add another machine running slurmctld and slurmdbd and this machine with be on CentOS 7. Existing one is CentOS 6. Is this possible? Can I run two seperat

Re: [slurm-users] dual slurmctld and slurmdbd

2019-07-03 Thread Ole Holm Nielsen
On 7/2/19 10:48 PM, Tina Fora wrote: We run mysql on a dedicated machine with slurmctld and slurmdbd running on another machine. Now I want to add another machine running slurmctld and slurmdbd and this machine with be on CentOS 7. Existing one is CentOS 6. Is this possible? Can I run two seperat