Sean Brisbane writes:
> Does anyone have a feeling for why setting a high Priority on a partition
> makes jobs run in that partition first regardless that a job in a different
> Partition may have a much higher overall priority?
Perhaps because that is what it was designed to do? Did you try us
Thank you for your reply. I was running 18.08.1 and updated to
18.08.6. Everything was solved. Thank you.
On Tue, 12 Mar 2019 at 20:23, Eli V wrote:
>
> On Tue, Mar 12, 2019 at 1:14 AM Andy Leung Yin Sui wrote:
> >
> > Hi,
> >
> > I am new to slurm and want to use weight option to schedule the j
Rick,
The issue is SLURM can only provide pmi2 support, and it seems Open MPI
only supports pmix
One option is to rebuild SLURM with PMIx as explained by Daniel, and then
srun --mpi=pmix ...
If you do not want (or cannot) rebuilt SLURM, you can use the older pmi
or pmi2.
In that case,
Hi.
On 12/03/2019 22:53:36, Riccardo
Veraldi wrote:
Hello,
after trynig hard for over 10 days I am forced to
write to the list.
It appears that we have gotten to the bottom of this problem! We
discovered that we only seem to see this problem if our overnight test
script is run with "nohup," as we have been doing for several years.
Typically, we would see the mysterious cancellations about once every
other day, or 3-4 ti
Both your Slurm and OpenMPI config.logs would be helpful in debugging
here. Throw in your slurm.conf as well for good measure. Also, what type
of system are you running, what type of high speed fabric are you trying
to run on, and what's your driver stack look like?
I know the feeling and will
Hello,
after trynig hard for over 10 days I am forced to write to the list.
I am not able to have SLURM work with openmpi. Openmpi compiled binaries
won't run on slurm, while all non openmpi progs run just fine under "srun".
I am using SLURM 18.08.5 building the rpm from the tarball: rpmbuild -ta
s
Are you uising the prioirty/multifactor plugin? What are the values of the
various Priority* weight factors?
On Tue, Mar 12, 2019 at 12:42 PM Sean Brisbane
wrote:
> Hi,
>
> Thanks for your help.
>
> Either setting qos or setting priority doesn't work for me. However I
> have found the cause if
Slurm should automatically block or reject jobs that can't run on that
partition in terms of memory usage for a single node. So you shouldn't
need to do anything. If you need something less than the max memory per
node then you will need to enforce some limits. We do this via a
jobsubmit lua
Hi,
Thanks for your help.
Either setting qos or setting priority doesn't work for me. However I have
found the cause if not the reason.
Using a Priority setting on the partition called "Priority" in slurm.conf
seems to force all jobs waiting on this queue to run first regardless of
any qos set
Hello,
I have set up a serial queue to run small jobs in the cluster. Actually, I
route jobs to this queue using the job_submit.lua script. Any 1 node job using
up to 20 cpus is routed to this queue, unless a user submits their job with an
exclusive flag.
The partition is shared and so I def
If the failures happen right after the job starts (or close enough), I’d use an
interactive session with srun (or some other wrapper that calls srun, such as
fisbatch).
Our hpcshell wrapper for srun is just a bash function:
=
hpcshell ()
{
srun --partition=interactive $@ --pty bash -i
Hello,
Some jobs have to be restarted several times until they run.
Users start the Job, it fails, they have to do some changes,
they start the job again, it fails again ... and so on.
So they want to keep the resources until the job is running properly.
Is there a possibility to 'inherit' alloc
On Tue, Mar 12, 2019 at 1:14 AM Andy Leung Yin Sui wrote:
>
> Hi,
>
> I am new to slurm and want to use weight option to schedule the jobs.
> I have some machine with same hardware configuration with GPU cards. I
> use QoS to force user at least required 1 gpu gres when submitting
> jobs.
> The ma
Sean Brisbane writes:
> I'm trying to troubleshoot why the highest priority job is not next to run,
> jobs in the partition called "Priority" seem to run first.
>
[...]
> The partition called "Priority" has a priority boost assigned through qos.
>
> PartitionName=Priority Nodes=compute[01-02] De
15 matches
Mail list logo