Paul, it would be incredibly helpful to reveal
* What version of Slurm you are using
* What Slurm commands you are using
* The mpirun command(s) that do effect what you desire
* Your slurm configuration -- preferably a copy of slurm.conf (with node
names and IP addresses obscured for security re
Hi,
On my cluster I normally run LSP programs across multiple nodes with mpirun
(MVAPICH2) and can do that successfully, however I have always had trouble
getting it to run successfully with srun. Either it will error out or the
program will instead run multiple instances of the same program ac
On Mon, Nov 13, 2017 at 10:15 AM, Benjamin Redling <
benjamin.ra...@uni-jena.de> wrote:
> On 11/12/17 4:52 PM, Gennaro Oliva wrote:
>
>> On Sun, Nov 12, 2017 at 10:03:18AM -0500, Will L wrote:
>>
>
> I just tried `sudo apt-get remove --purge munge`, etc., and munge itself
>>>
>>
> this should have
Hi guys,
Thanks for the replay.
I will try to add my slurm.conf tomorrow. Sadly, its a bit of a problem
since its on a cluster disconected from the net and with no easy way of
getting it out :(
I will try tomorrow with the hope that any body could catch some bad
parameter.
Thanks,
Roy
On Nov 13,
Assuming you are using backfill, I suspect this is caused by using default
schedulerparameters, specifically the bf maxjobs or other similar limits
that would prevent jobs from being reviewed. Setting debugflags=backfill
will help greatly in debugging these issues.
There are analogous parameters
I'm guessing you should have sent them to cluster Decepticon, instead
In all seriousness though, provide the conf file. You might have
accidentally set a maximum number of running jobs somewhere
On Nov 13, 2017 7:28 AM, "Benjamin Redling"
wrote:
> Hi Roy,
>
> On 11/13/17 2:37 PM, Roe Zohar
Now that there is a slurm-users mailing list, I thought I would share
something with the community that I have been working on to see if anyone else
is interested in it. I have a lot of students on my cluster and I really
wanted a way to show my users how efficient their jobs are, or let them know
Hi Roy,
On 11/13/17 2:37 PM, Roe Zohar wrote:
[...]
I sent 3000 jobs with feature Optimus and part are running while part
are pendind. Which is ok.
But I have sent 1000 jobs to Megatron and they are all in pending
stating they wait because of priority. Whay os that?
B.t.w if I change their pr
On 11/12/17 4:52 PM, Gennaro Oliva wrote:
On Sun, Nov 12, 2017 at 10:03:18AM -0500, Will L wrote:
I just tried `sudo apt-get remove --purge munge`, etc., and munge itself
this should have uninstalled slurm-wlm also, did you reinstalled it with apt?
seems to be working fine. But I still g
Hi,
For example, this configuration in slurm.conf works fine:
NodeName=kilimanjaro CPUs=16 RealMemory=80419 Sockets=1
CoresPerSocket=8 ThreadsPerCore=2 State=UNKNOWN
PartitionName=slurmtest Nodes=kilimanjaro Default=YES
MaxTime=INFINITE State=UP
This configuration works also:
NodeName
Hello all,
I have a cluster name: Autobot
In this cluster I have servers:
Optimus[1-10] and
Megatron[1-10].
I sent 3000 jobs with feature Optimus and part are running while part are
pendind. Which is ok.
But I have sent 1000 jobs to Megatron and they are all in pending stating
they wait because of
Dear all,
we have tried to update to Slurm 17.02.9.
What do these messages mean (and how can we get rid of it):
slurmstepd error: slurm_persist_conn_open: No response to persist_init
slurmstepd error: slurmdbd: Sending PersistInit msg: No error
Thanks,
Ulf
--
__
12 matches
Mail list logo