Hello, all:
Details:
* slurm 20.02.6
* MariaDB 10.3.17
* RHEL 8.1
I have a fairshare setup. I went through a couple of iterations in testing of
manually creating accounts and users that I later deleted before putting in
what is to be the production setup.
One of the deleted accounts
On 2021/02/10 09:33, Christopher Samuel wrote:
Also getting users to use `sacct` rather than `squeue` to check what
state a job is in can help a lot too, it reduces the load on slurmctld.
That raises an interesting take on the two utilities, Chris,
in that
1) It should be possible to write a
On 2/9/21 5:08 pm, Paul Edmon wrote:
1. Being on the latest release: A lot of work has gone into improving
RPC throughput, if you aren't running the latest 20.11 release I highly
recommend upgrading. 20.02 also was pretty good at this.
We've not gone to 20.11 on production systems yet, but I
We've hit this before several times. The tricks we've used to deal with
this are:
1. Being on the latest release: A lot of work has gone into improving
RPC throughput, if you aren't running the latest 20.11 release I highly
recommend upgrading. 20.02 also was pretty good at this.
2. max_rpc
Hello guys,
In our cluster, sometimes new incoming member accidentally creates too many
slurm RPC calls (sbatch, sacct, etc), then slurmctld,
slurmdbd, and mysql may be overloaded.
To prevent such a situation, I'm looking for something like RPC Rate Limit for
users. Does Slurm supports such a Ra
Hello all,
I've noticed an odd behaviour with job steps in some Slurm environments.
When a script is launched directly as a job, the output is written to file
immediately. When the script is launched as a step in a job, output is
written in ~30 second chunks. This doesn't happen in all Slurm
envir
Well, I seem to have figured it out. This worked and did what I wanted to (I
think):
$ sudo sacctmgr archive dump Directory=/data/Backups/Slurm
PurgeEventAfter=1hour \
PurgeJobAfter=1hour PurgeStepAfter=1hour PurgeSuspendAfter=1hour \
PurgeUsageAfter=1hour Events Jobs Ste
Hi,
We have a similar configuration, very heterogeneous cluster and cons_tres.
Users need to specify the CPU/memory/GPU/time, and it will schedule their
job somewhere. Indeed there's currently no guarantee that you won't be left
with a node with unusable GPUs because no CPUs or memory are availabl
Hi Jianwen,
I guess the -p or -P flag does what you want?
Best regards,
Angelos
(Sent from mobile, please pardon me for typos and cursoriness.)
> 9/2/2021 21:46、SJTU のメール:
>
> Hi,
>
>I am using SLURM 19.05.7 . Is it possible to insert user-defined
> separating characters like "|" or ","
Hi,
I am using SLURM 19.05.7 . Is it possible to insert user-defined
separating characters like "|" or "," into sacct's formatted outputs? That
would make it easier to parse fields.
Thank you!
Jianwen
yes, the problem was slurmd could not find munge so I added prefix to configure
command and now it works.
thank you
From: slurm-users on behalf of Tina
Friedrich
Sent: Tuesday, February 9, 2021 1:22:14 PM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-
That looks odd - I mean I think it very straightforwardly wants to tell
you that you've configured AuthType=auth/munge and SLURM can't find the
auth_munge plugin. I didn't think you could even build SLURM without it
finding munge, that's what puzzles me :)
What version of SLURM is this?
How d
Dear slurm user community,
I am trying to setup a slurm cluster but I am getting the following error:
# exec /usr/local/sbin/slurmd -v -D -f /etc/slurm/slurm.conf
slurmd: Node configuration differs from hardware: CPUs=10:72(hw) Boards=1:1(hw)
SocketsPerBoard=10:2(hw) CoresPerSocket=1:18(hw) Th
13 matches
Mail list logo