Hi Ole,
we had a similar issues on our systems. As I understand from the bug you
linked, we just need to wait until all the old jobs are finished (and
the old slurmstepd are gone). So a full drain should not be necessary?
Best,
Marcus
On 05.05.22 13:53, Ole Holm Nielsen wrote:
Just a heads-u
Hi Loris,
I can confirm the problem: I am not able to modify the
job_desc.mail_user. Other values can be modified, though.
We are also on 21.08.5
Best,
Marcus
On 10.01.22 11:14, Loris Bennett wrote:
Hi,
Does setting 'mail_user' in job_submit.lua actually work in Slurm
21.08.5?
I have, if
I already answered tons of tickets due to this, when our users are
confused, that the job silently fails.
The problem is, you cannot solve this with a job_submit or cli_filter,
as you do not know the situation of the file system at job runtime. Or
even on the node in the end.
At lest the slurm
Hi Doug,
Slurm has the strigger[1] mechanism that can do exactly that, the
manpage even has your use case as an example. It works quite well for us.
Best,
Marcus
[1] https://slurm.schedmd.com/strigger.html
On 26.06.21 19:10, Doug Niven wrote:
Hi Folks,
I’d like to setup an email notificati
mails.
Regarding strigger, I don't know how to become the slurm user. "su slurm"
complains "This account is currently not available.". The user "slurm"
exists and is the SlurmUser.
Best,
On Mon, Jun 14, 2021 at 5:09 AM Ole Holm Nielsen
wrote:
On 6/14/21 7:50 AM,
Hi,
you will need to use sacct to get the information from the slurmdbd.
It's not the same information and you will have to find the right fields
to display, but it is pretty powerful. Have a look at the man page for
the available fields.
Best,
Marcus
On 14.06.21 08:26, Gestió Servidors wro
Hi,
Slurm provides the strigger[1] utility for that. You can set it up to
automatically send mails when nodes go into drain.
Best,
Marcus
[1] https://slurm.schedmd.com/strigger.html
On 12.06.21 22:29, Rodrigo Santibáñez wrote:
Hi SLURM users,
Does anyone have a cronjob or similar to monito
Hi,
as per
https://slurm.schedmd.com/archive/slurm-18.08.5/sbatch.html#OPT_nodelist
Request a specific list of hosts. The job will contain *all* of these hosts and
possibly additional hosts as needed to satisfy resource requirements.
So at least in the sbatch manpage it explicitly states t
I have the same in our config.log and the x11 forwarding works fine. No
other lines around it (about some failing checks or something), just this:
[...]
configure:22134: WARNING: unable to locate rrdtool installation
configure:22176: support for ucx disabled
configure:22296: checking whether Slu
Hi Thekla,
it is build in by default since... some time. You need to activate it by
adding
PrologFlags=X11
to your slurm.conf (see here:
https://slurm.schedmd.com/slurm.conf.html#OPT_PrologFlags)
Best,
Marcus
On 27.05.21 14:07, Thekla Loizou wrote:
Dear all,
I am trying to use X11 forward
Hi everyone,
On 08.04.21 02:13, Christopher Samuel wrote:
I've not had issues with naming partitions in the past, though I can
imagine `default` could cause confusion as there is a `default=yes`
setting you can put on the one partition you want as the default choice.
more than that. The Parti
hat on our system, so I don't think
that's necessary.
Best,
Marcus
On 10.03.21 12:06, Reuti wrote:
Am 09.03.2021 um 13:37 schrieb Marcus Boden :
Then I have good news for you! There is the --delimiter option:
https://slurm.schedmd.com/sacct.html#OPT_delimiter=
Aha, perfect – thx
Then I have good news for you! There is the --delimiter option:
https://slurm.schedmd.com/sacct.html#OPT_delimiter=
Best,
Marcus
On 09.03.21 12:10, Reuti wrote:
Hi:
Am 09.03.2021 um 08:19 schrieb Bjørn-Helge Mevik :
"xiaojingh...@163.com" writes:
I am doing a parsing job on slurm fields.
Hi Xiaojing,
my experience here is: you will have to try it out and see what works.
At least that's what I do whenever I parse sacct, as I did not find a
detailed description anywhere. The manpage is quite incomplete in that
regard.
Best,
Marcus
On 05.03.21 03:02, xiaojingh...@163.com wrote
Hi,
you could write a job_submit plugin:
https://slurm.schedmd.com/job_submit_plugins.html
The Site factor was added to priority for that exact reason.
Best,
Marcus
On 11/12/20 10:58 AM, SJTU wrote:
> Hello,
>
> We want to raise the priority of a certain kind of slurm jobs. We considered
> do
Hi Jianwen,
yes, you can give different accounts or users specific extra-priorities.
You can set it via sacctmgr:
https://slurm.schedmd.com/sacctmgr.html#SECTION_GENERAL-SPECIFICATIONS-FOR-ASSOCIATION-BASED-ENTITIES
(scroll down to 'Priority')
Priority
What priority will be added to a job's pr
Hi,
you can add those as "Features" for the nodes, see:
https://slurm.schedmd.com/slurm.conf.html#OPT_Feature
Best,
Marcus
On 9/3/20 2:52 PM, Gestió Servidors wrote:
> Hello,
>
> I would like to apply some constraint options to my nodes. For example,
> infiniband available, processor model, et
Hi Navin,
try running slurmd in the foregrund with increased verbosity:
slurmd -D -v (add as many v as you deem necessary)
Hopefully it'll tell you more about why it times out.
Best,
Marcus
On 6/11/20 2:24 PM, navin srivastava wrote:
> Hi Team,
>
> when i am trying to start the slurmd process
Hi,
> Some minutes ago, I have applied "MaxJobs=3" for an user. After that, if I
> ran "sacctmgr -s show user MYUSER format=account,user,maxjobs", system showed
> a "3" at the maxjobs column. However, now, I have run a "squeue" and I'm
> seeing 4 jobs (from that user) in "running" state... Shou
Hi,
the default time window starts at 00:00:00 of the current day:
-S, --starttime
Select jobs in any state after the specified time. Default
is 00:00:00 of the current day, unless the '-s' or '-j'
options are used. If the '-s' option is used, then the
Hi,
your looking for 'associations' between users, accounts and their limits.
Try `sacctmgr show assoc [tree]`
Best,
Marcus
On 20-02-28 09:38, Matthias Krawutschke wrote:
> Dear Slurm-User,
>
>
>
> I have a simple question about User and Account Management on SLURM.
>
>
>
> How can I f
Hi everyone,
I am facing a bit of a weird issue with CPU bindings and mpirun:
My jobscript:
#SBATCH -N 20
#SBATCH --tasks-per-node=40
#SBATCH -p medium40
#SBATCH -t 30
#SBATCH -o out/%J.out
#SBATCH -e out/%J.err
#SBATCH --reservation=root_98
module load impi/2019.4 2>&1
export I_MPI_DEBUG=6
exp
We had this issue recently. Some googling led me to the NERSC FAQs,
which state:
> _is_a_lwp is a function called internally for Slurm job accounting. The
> message indicates a rare error situation with a function call. But the error
> shouldn't affect anything in the user job. Please ignore t
HI,
to your first question: I don't know the exact reason, but SchedMD made
it pretty clear, that there is a spcific sequence for updates:
slurmdbd -> slurmctld -> slurmd -> commands
See https://slurm.schedmd.com/SLUG19/Field_Notes_3.pdf (or any of the
other field notes) for details.
So, I'd advis
you can also use the UnkillableStepProgram to debug things:
> UnkillableStepProgram
> If the processes in a job step are determined to be unkillable for a
> period of time specified by the UnkillableStepTimeout variable, the program
> specified by UnkillableStepProgram will be executed. This
Hi Jürgen,
you're looking for KillOnBadExit in the slurm.conf:
KillOnBadExit
If set to 1, a step will be terminated immediately if any task is crashed
or aborted, as indicated by a non-zero exit code. With the default value of 0,
if one of the processes is crashed or aborted the other proces
Hey everyone,
I am using Telegraf and InfluxDB to monitor our hardware and I'd like to
include some slurm metrics into this. Is there already a telegraf plugin
for monitoring slurm I don't know about, or do I have to start from
scratch?
Best,
Marcus
--
Marcus Vincent Boden, M.Sc.
Arbeitsgruppe e
Hello Doug,
tp quote the slurm.conf page:
It would be preferable to allocate smaller memory nodes rather than
larger memory nodes if either will satisfy a job's requirements.
So I guess the idea is, that if a smaller node satisfies all
requirements, why 'waste' a bigger one for it? It makes sense
Hi Fabio,
are you sure that command substition works in the #SBATCH part of the
jobscript? I don't think that slurm actally evaluates that, though I
might be wrong.
It seems like the #SBATCH after the --job-name line are not evaluated
anymore, therefore you can't start srun with two tasks (since
>
> Yeah, on our systems, I get:
> Sorry, gawk version 4.0 or later is required. Your version is: GNU Awk
> 3.1.7
> (RHEL 6). So this one wasn't as useful for me. But thanks anyway!
Just an FYI: Building gawk locally is pretty easy (a simple configure,
make, make install), so that might b
Hi,
this is usually due to a misconfiguration in your gres.conf (at least it
was for me). Can you show your gres.conf?
Best,
Marcus
On 19-06-27 15:33, Valerio Bellizzomi wrote:
> hello, my node has 2 gpus so I have specified gres=gpus:2 but the
> scontrol show node displays this:
>
> State=IDLE
31 matches
Mail list logo