On Tue, 14 Nov 2017 14:58:00 +
Zohar Roe MLM wrote:
> Hello,
> Trying again with the slurm.conf This time.
>
> I have a cluster name: Autobot
> In this cluster I have servers:
> Optimus[1-10] and
> Megatron[1-10].
>
> I sent 3000 jobs with feature Optimus and part are running while part
> a
Hi Roy,
What command are you using to start the jobs?
On 11/14/2017 09:58 AM, Zohar Roe MLM wrote:
Hello,
Trying again with the slurm.conf This time.
I have a cluster name: Autobot
In this cluster I have servers:
Optimus[1-10] and
Megatron[1-10].
I sent 3000 jobs with feature Optimus and
Hello,
Trying again with the slurm.conf This time.
I have a cluster name: Autobot
In this cluster I have servers:
Optimus[1-10] and
Megatron[1-10].
I sent 3000 jobs with feature Optimus and part are running while part are
pendind. Which is ok.
But I have sent 1000 jobs to Megatron and they are a
Hi guys,
Thanks for the replay.
I will try to add my slurm.conf tomorrow. Sadly, its a bit of a problem
since its on a cluster disconected from the net and with no easy way of
getting it out :(
I will try tomorrow with the hope that any body could catch some bad
parameter.
Thanks,
Roy
On Nov 13,
Assuming you are using backfill, I suspect this is caused by using default
schedulerparameters, specifically the bf maxjobs or other similar limits
that would prevent jobs from being reviewed. Setting debugflags=backfill
will help greatly in debugging these issues.
There are analogous parameters
I'm guessing you should have sent them to cluster Decepticon, instead
In all seriousness though, provide the conf file. You might have
accidentally set a maximum number of running jobs somewhere
On Nov 13, 2017 7:28 AM, "Benjamin Redling"
wrote:
> Hi Roy,
>
> On 11/13/17 2:37 PM, Roe Zohar
Hi Roy,
On 11/13/17 2:37 PM, Roe Zohar wrote:
[...]
I sent 3000 jobs with feature Optimus and part are running while part
are pendind. Which is ok.
But I have sent 1000 jobs to Megatron and they are all in pending
stating they wait because of priority. Whay os that?
B.t.w if I change their pr
Hello all,
I have a cluster name: Autobot
In this cluster I have servers:
Optimus[1-10] and
Megatron[1-10].
I sent 3000 jobs with feature Optimus and part are running while part are
pendind. Which is ok.
But I have sent 1000 jobs to Megatron and they are all in pending stating
they wait because of