On Wednesday, 25 April 2018 3:47:17 PM AEST Chris Samuel wrote:
> I'll open a bug just in case..
https://bugs.schedmd.com/show_bug.cgi?id=5097
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
On Wednesday, 25 April 2018 5:59:38 AM AEST Christopher Benjamin Coffey wrote:
> #define MAX_MSG_SIZE (16*1024*1024)
That is really really strange, there are 4 different definitions of that
symbol in the source code.
$ git grep 'define MAX_MSG_SIZE'
src/common/slurm_persist_conn.c:#define MA
We've gotten around the issue where we could not remove the runaway jobs. We
had to go the manual route of manipulating the db directly. We actually used a
great script that Loris Bennet wrote a while back. I haven't had to use it for
a long while - thanks again! :)
An item of interest for the
Hi, we have an issue currently where we have a bunch (56K) of runaway jobs, but
we cannot clear them:
sacctmgr show runaway|wc -l
sacctmgr: error: slurmdbd: Sending message type 1488: 11: No error
sacctmgr: error: Failed to fix runaway job: Resource temporarily unavailable
58588
Has anyone run
I would likely crank up the debugging on the slurmd process and look at the log
files to see what’s going on in that time. You could also watch the job via top
or other means (on Linux, you can press “1” to see line-by-line for each CPU
core), or use strace on the process itself. Presumably some
How do you start it?
If you use Sys V style startup scripts, then likely /etc/Init.d/slurm stop, but
if you;re using systemd, then probably systemctl stop slurm.service (but I
don’t do systemd).
Best,
Bill.
Sent from my phone
> On Apr 24, 2018, at 11:15 AM, Mahmood Naderan wrote:
>
> Hi Bi
Chris,
So the problem still exists ;)
>Yes, if you are happy
>for the asymmetry then you can do that.
That is the question. MaxCPUsPerNode is for symmetrically set the max
core number for all nodes in the partition. That is not applicable for
asymmetric cases.
Regards,
Mahmood
On Mon, Apr 23
Hi Bill,
In order to shutdown the slurm process on the compute node, is it fine
to kill /usr/sbin/slurm? Or there is a better and safer way for that?
Regards,
Mahmood
On Sun, Apr 22, 2018 at 5:44 PM, Bill Barth wrote:
> Mahmood,
>
> If you have exclusive control of this system and can afford
On Tuesday, 24 April 2018 5:40:22 PM AEST Diego Zuccato wrote:
> I'd say they do *not* act as a single partition... Unless I missed some
> key detail, once a node is assigned a job in a partition, it's
> unavailable *as a whole* to other partitions.
No, that's not right, we have overlapping parti
Il 20/04/2018 15:56, Renfro, Michael ha scritto:
> Not sure how to answer if they “essentially act as a single partition”,
> though. Resources allocated to a job in a given partition are unavailable to
> other jobs, regardless of what partition they’re in.
I'd say they do *not* act as a single p
10 matches
Mail list logo