Hi Chris,
Thank you for your reply regarding OpenMPI and srun. When I try to run an mpi
program using srun I find the following..
red[036-037]
[red036.cluster.local:308110] PMI_Init [pmix_s1.c:168:s1_init]: PMI is not
initialized
[red036.cluster.local:308107] PMI_Init [pmix_s1.c:168:s1_init]:
Hi Chris,
Thank you for your comments. Yesterday I experimented with increasing the
PriorityWeightJobSize and that does appear to have quite a profound effect on
the job mix executing at any one time. Larger jobs (needing 5 nodes or above)
are now getting a decent share of the nodes in the clu
Hello,
A colleague intimated that he thought that larger jobs were tending to get
starved out on our slurm cluster. It's not a busy time at the moment so it's
difficult to test this properly. Back in November it was not completely unusual
for a larger job to have to wait up to a week to start.
Hello,
Thank you for your comments on installing and using TurboVNC. I'm working on
the installation at the moment, and may get back with other questions relating
to the use of Slurm with VNC.
Best regards,
David
From: slurm-users on behalf of Daniel
Letai
Hello,
We have set up our NICE/DCV cluster and that is proving to be very popular.
There are, however, users who would benefit from using the resources offered by
our nodes with multiple GPU cards. This potentially means setting up TurboVNC,
for example. I would, if possible, like to be able t
Hello,
I wondered if someone could please help us to understand why the
PrologFlags=contain flag is causing jobs to fail and draining compute nodes. We
are, by the way, using slurm 18.08.0. Has anyone else seem this behaviour?
I'm currently experimenting with PrologFlags=contain. I've found tha
ds,
David
From: slurm-users on behalf of Chris
Samuel
Sent: 20 November 2018 20:12:20
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] Excessive use of backfill on a cluster
On Tuesday, 20 November 2018 11:42:49 PM AEDT Baker D. J. wrote:
> We are running Slu
Hi Lois
Thank you for sharing your multi priority configuration with us. I understand
why you say about the QOS factor -- I've reduced it and increased the FS factor
to see where that takes us. Our QOS factor is only there to ensure that test
jobs gain a higher priority more quickly than other
Hello,
Thank you for your reply and for the explanation. That makes sense -- your
explanation of backfill is as we expected. I think it's more that we are
surprised that almost all our jobs were being scheduled using backfill. We very
rarely see any being scheduled normally. It could be that w
Hello,
We are running Slurm 18.08.0 on our cluster and I am concerned that Slurm
appears to be using backfill scheduling excessively. In fact the vast majority
of jobs are being scheduled using backfill. So, for example, I have just
submitted a set of three serial jobs. They all started on a c
Hello Mike et al,
This is a known bug in slurm v18.08*. We installed the initial release a short
while ago and came across this issue very quickly. We actually use this script
at the end of the job epilog to report job efficiency to users, and so it is
real shame that it is now broken! The goo
Hello,
Thank you for your useful replies. It certainly not anywhere as difficult as I
initially thought. We should be able to start some tests later this week.
Best regards,
David
From: slurm-users on behalf of Roche
Ewan
Sent: 10 October 2018 08:07
To: S
Hello,
We are starting to think about developing a lua job submission script. For
example, we are keen to route jobs requiring no more than 1 compute node
(single core jobs and small parallel jobs) to a slurm shared partition. The
idea being that "small" jobs can share a small set of compute n
Hello,
We have just finished an upgrade to slurm 18.08. My last task was to reset the
slurmctld/slurmd timeouts to sensible values -- as they were set prior to the
update. That is..
SlurmctldTimeout= 60 sec
SlurmdTimeout = 300 sec
With slurm <18.08 I've reconfigure the clu
om: slurm-users on behalf of Chris
Samuel
Sent: 26 September 2018 11:26
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] Upgrading a slurm on a cluster, 17.02 --> 18.08
On Tuesday, 25 September 2018 11:54:31 PM AEST Baker D. J. wrote:
> That will certainly work, however the slur
David
From: slurm-users on behalf of Chris
Samuel
Sent: 25 September 2018 13:00
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] Upgrading a slurm on a cluster, 17.02 --> 18.08
On Tuesday, 25 September 2018 9:41:10 PM AEST Baker D. J. wrote:
> I guess that the only sol
Hello,
I wondered if I could compare notes with other community members who have
upgraded slurm on their cluster. We are currently running slurm v17.02 and I
understand that the rpm mix/structure changed at v17.11. We are, by the way,
planning to upgrade to v18.08.
I gather that I should upg
Hello,
I'm sure that this question has been asked before. We have recently added some
GPU nodes to our SLURM cluster.
There are 10 nodes each providing 2 * Tesla V100-PCIE-16GB cards
There are 10 nodes each providing 4 * GeForce GTX 1080 Ti cards
I'm aware that the simplest way to manage these
18 matches
Mail list logo