Hi Chris,
Chris Samuel writes:
> On 3/7/19 8:49 am, David Baker wrote:
>
>> Does the above make sense or is it too complicated?
>
> [looks at our 14 partitions and 112 QOS's]
>
> Nope, that seems pretty simple. We do much the same here.
Out of interest, how many partitions and QOSs would an av
Your welcome :)
If you aren't pleased with the timeouts, you may want to look at the
SlurmctldTimeout in slurm.conf:
SlurmctldTimeout
The interval, in seconds, that the backup controller waits for the
primary controller to respond before assuming control. The default value
is 120 seconds. Ma
Thanks Brian Andrus and Chris Samuel.
I was able to get it to work on our dev setup as primary/backup. Already
had the shared state directory. If I take primary down it takes about two
minutes for slurm commands to work again as the backup takes over. When I
bring the primary back up it is a bit f
On 3/7/19 8:49 am, David Baker wrote:
Does the above make sense or is it too complicated?
[looks at our 14 partitions and 112 QOS's]
Nope, that seems pretty simple. We do much the same here.
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
The dual QoSes (or dual partition solution suggested by someone else)
should both work in allow select users to submit jobs with longer run
times. We use something like that on our cluster (though I confess it was
our first Slurm cluster and we might have overdid it with QoSes causing
scheduler to
Hi,
On Wed, Jul 03, 2019 at 03:49:44PM +, David Baker wrote:
> Hello,
>
>
> A few of our users have asked about running longer jobs on our cluster.
> Currently our main/default compute partition has a time limit of 2.5 days.
> Potentially, a handful of users need jobs to run up to 5 hours. R
On 3/7/19 8:17 am, Lech Nieroda wrote:
Is that the expected behaviour or a bug?
I'm not seeing that here with 18.08.7 and salloc, I'm only seeing:
SLURM_NTASKS=5
that's both with our default salloc command that pushes users out to the
compute node they've been allocated or specifying /bin/b
Hello,
A few of our users have asked about running longer jobs on our cluster.
Currently our main/default compute partition has a time limit of 2.5 days.
Potentially, a handful of users need jobs to run up to 5 hours. Rather than
allow all users/jobs to have a run time limit of 5 days I wonder
Hi all,
there seems to be a discrepancy in the SLURM_NTASKS values depending on the
job type.
For example, let’s say the job requests 5 tasks (-n 5), is submitted with
sbatch, then its job step uses only 1 task (e.g. srun -n 1). In that case
you’ll see following values (with every launcher):
On 2/7/19 1:48 pm, Tina Fora wrote:
We run mysql on a dedicated machine with slurmctld and slurmdbd running on
another machine. Now I want to add another machine running slurmctld and
slurmdbd and this machine with be on CentOS 7. Existing one is CentOS 6.
Is this possible? Can I run two seperat
On 7/2/19 10:48 PM, Tina Fora wrote:
We run mysql on a dedicated machine with slurmctld and slurmdbd running on
another machine. Now I want to add another machine running slurmctld and
slurmdbd and this machine with be on CentOS 7. Existing one is CentOS 6.
Is this possible? Can I run two seperat
11 matches
Mail list logo