Re: [slurm-users] CommunicationParameters=block_null_hash issue in 21.08.8

2022-05-05 Thread Marcus Boden
Hi Ole, we had a similar issues on our systems. As I understand from the bug you linked, we just need to wait until all the old jobs are finished (and the old slurmstepd are gone). So a full drain should not be necessary? Best, Marcus On 05.05.22 13:53, Ole Holm Nielsen wrote: Just a heads-u

Re: [slurm-users] Does setting 'job_desc.mail_user' in job_submit.lua work?

2022-01-10 Thread Marcus Boden
Hi Loris, I can confirm the problem: I am not able to modify the job_desc.mail_user. Other values can be modified, though. We are also on 21.08.5 Best, Marcus On 10.01.22 11:14, Loris Bennett wrote: Hi, Does setting 'mail_user' in job_submit.lua actually work in Slurm 21.08.5? I have, if

Re: [slurm-users] Bug: incorrect output directory fails silently

2021-07-08 Thread Marcus Boden
I already answered tons of tickets due to this, when our users are confused, that the job silently fails. The problem is, you cannot solve this with a job_submit or cli_filter, as you do not know the situation of the file system at job runtime. Or even on the node in the end. At lest the slurm

Re: [slurm-users] Best method to determine if a node is down

2021-06-27 Thread Marcus Boden
Hi Doug, Slurm has the strigger[1] mechanism that can do exactly that, the manpage even has your use case as an example. It works quite well for us. Best, Marcus [1] https://slurm.schedmd.com/strigger.html On 26.06.21 19:10, Doug Niven wrote: Hi Folks, I’d like to setup an email notificati

Re: [slurm-users] monitor draining/drain nodes

2021-06-14 Thread Marcus Boden
mails. Regarding strigger, I don't know how to become the slurm user. "su slurm" complains "This account is currently not available.". The user "slurm" exists and is the SlurmUser. Best, On Mon, Jun 14, 2021 at 5:09 AM Ole Holm Nielsen wrote: On 6/14/21 7:50 AM,

Re: [slurm-users] Information about finished jobs

2021-06-13 Thread Marcus Boden
Hi, you will need to use sacct to get the information from the slurmdbd. It's not the same information and you will have to find the right fields to display, but it is pretty powerful. Have a look at the man page for the available fields. Best, Marcus On 14.06.21 08:26, Gestió Servidors wro

Re: [slurm-users] monitor draining/drain nodes

2021-06-13 Thread Marcus Boden
Hi, Slurm provides the strigger[1] utility for that. You can set it up to automatically send mails when nodes go into drain. Best, Marcus [1] https://slurm.schedmd.com/strigger.html On 12.06.21 22:29, Rodrigo Santibáñez wrote: Hi SLURM users, Does anyone have a cronjob or similar to monito

Re: [slurm-users] Conflicting --nodes and --nodelist

2021-06-01 Thread Marcus Boden
Hi, as per https://slurm.schedmd.com/archive/slurm-18.08.5/sbatch.html#OPT_nodelist Request a specific list of hosts. The job will contain *all* of these hosts and possibly additional hosts as needed to satisfy resource requirements. So at least in the sbatch manpage it explicitly states t

Re: [slurm-users] Building SLURM with X11 support

2021-05-28 Thread Marcus Boden
I have the same in our config.log and the x11 forwarding works fine. No other lines around it (about some failing checks or something), just this: [...] configure:22134: WARNING: unable to locate rrdtool installation configure:22176: support for ucx disabled configure:22296: checking whether Slu

Re: [slurm-users] Building SLURM with X11 support

2021-05-27 Thread Marcus Boden
Hi Thekla, it is build in by default since... some time. You need to activate it by adding PrologFlags=X11 to your slurm.conf (see here: https://slurm.schedmd.com/slurm.conf.html#OPT_PrologFlags) Best, Marcus On 27.05.21 14:07, Thekla Loizou wrote: Dear all, I am trying to use X11 forward

Re: [slurm-users] PartitionName default

2021-04-07 Thread Marcus Boden
Hi everyone, On 08.04.21 02:13, Christopher Samuel wrote: I've not had issues with naming partitions in the past, though I can imagine `default` could cause confusion as there is a `default=yes` setting you can put on the one partition you want as the default choice. more than that. The Parti

Re: [slurm-users] How can I get complete field values with without specify the length

2021-03-10 Thread Marcus Boden
hat on our system, so I don't think that's necessary. Best, Marcus On 10.03.21 12:06, Reuti wrote: Am 09.03.2021 um 13:37 schrieb Marcus Boden : Then I have good news for you! There is the --delimiter option: https://slurm.schedmd.com/sacct.html#OPT_delimiter= Aha, perfect – thx

Re: [slurm-users] How can I get complete field values with without specify the length

2021-03-09 Thread Marcus Boden
Then I have good news for you! There is the --delimiter option: https://slurm.schedmd.com/sacct.html#OPT_delimiter= Best, Marcus On 09.03.21 12:10, Reuti wrote: Hi: Am 09.03.2021 um 08:19 schrieb Bjørn-Helge Mevik : "xiaojingh...@163.com" writes: I am doing a parsing job on slurm fields.

Re: [slurm-users] About sacct --format: how can I get info about the fields

2021-03-05 Thread Marcus Boden
Hi Xiaojing, my experience here is: you will have to try it out and see what works. At least that's what I do whenever I parse sacct, as I did not find a detailed description anywhere. The manpage is quite incomplete in that regard. Best, Marcus On 05.03.21 03:02, xiaojingh...@163.com wrote

Re: [slurm-users] Raise the priority of a certain kind of jobs

2020-11-12 Thread Marcus Boden
Hi, you could write a job_submit plugin: https://slurm.schedmd.com/job_submit_plugins.html The Site factor was added to priority for that exact reason. Best, Marcus On 11/12/20 10:58 AM, SJTU wrote: > Hello, > > We want to raise the priority of a certain kind of slurm jobs. We considered > do

Re: [slurm-users] How to set association factor in Multifactor Priority

2020-09-23 Thread Marcus Boden
Hi Jianwen, yes, you can give different accounts or users specific extra-priorities. You can set it via sacctmgr: https://slurm.schedmd.com/sacctmgr.html#SECTION_GENERAL-SPECIFICATIONS-FOR-ASSOCIATION-BASED-ENTITIES (scroll down to 'Priority') Priority What priority will be added to a job's pr

Re: [slurm-users] Submitting jobs with constraint option

2020-09-03 Thread Marcus Boden
Hi, you can add those as "Features" for the nodes, see: https://slurm.schedmd.com/slurm.conf.html#OPT_Feature Best, Marcus On 9/3/20 2:52 PM, Gestió Servidors wrote: > Hello, > > I would like to apply some constraint options to my nodes. For example, > infiniband available, processor model, et

Re: [slurm-users] unable to start slurmd process.

2020-06-11 Thread Marcus Boden
Hi Navin, try running slurmd in the foregrund with increased verbosity: slurmd -D -v (add as many v as you deem necessary) Hopefully it'll tell you more about why it times out. Best, Marcus On 6/11/20 2:24 PM, navin srivastava wrote: > Hi Team, > > when i am trying to start the slurmd process

Re: [slurm-users] MaxJobs not working

2020-05-18 Thread Marcus Boden
Hi, > Some minutes ago, I have applied "MaxJobs=3" for an user. After that, if I > ran "sacctmgr -s show user MYUSER format=account,user,maxjobs", system showed > a "3" at the maxjobs column. However, now, I have run a "squeue" and I'm > seeing 4 jobs (from that user) in "running" state... Shou

Re: [slurm-users] sacct returns nothing after reboot

2020-05-13 Thread Marcus Boden
Hi, the default time window starts at 00:00:00 of the current day: -S, --starttime Select jobs in any state after the specified time. Default is 00:00:00 of the current day, unless the '-s' or '-j' options are used. If the '-s' option is used, then the

Re: [slurm-users] Question about SacctMgr....

2020-02-28 Thread Marcus Boden
Hi, your looking for 'associations' between users, accounts and their limits. Try `sacctmgr show assoc [tree]` Best, Marcus On 20-02-28 09:38, Matthias Krawutschke wrote: > Dear Slurm-User, > > > > I have a simple question about User and Account – Management on SLURM. > > > > How can I f

[slurm-users] Inconsistent cpu bindings with cpu-bind=none

2020-02-17 Thread Marcus Boden
Hi everyone, I am facing a bit of a weird issue with CPU bindings and mpirun: My jobscript: #SBATCH -N 20 #SBATCH --tasks-per-node=40 #SBATCH -p medium40 #SBATCH -t 30 #SBATCH -o out/%J.out #SBATCH -e out/%J.err #SBATCH --reservation=root_98 module load impi/2019.4 2>&1 export I_MPI_DEBUG=6 exp

Re: [slurm-users] slurmstepd: error: _is_a_lwp

2020-02-04 Thread Marcus Boden
We had this issue recently. Some googling led me to the NERSC FAQs, which state: > _is_a_lwp is a function called internally for Slurm job accounting. The > message indicates a rare error situation with a function call. But the error > shouldn't affect anything in the user job. Please ignore t

Re: [slurm-users] Upgrade or /-date to Release 20.02p1 .....

2020-02-04 Thread Marcus Boden
HI, to your first question: I don't know the exact reason, but SchedMD made it pretty clear, that there is a spcific sequence for updates: slurmdbd -> slurmctld -> slurmd -> commands See https://slurm.schedmd.com/SLUG19/Field_Notes_3.pdf (or any of the other field notes) for details. So, I'd advis

Re: [slurm-users] Nodes going into drain because of "Kill task failed"

2019-10-22 Thread Marcus Boden
you can also use the UnkillableStepProgram to debug things: > UnkillableStepProgram > If the processes in a job step are determined to be unkillable for a > period of time specified by the UnkillableStepTimeout variable, the program > specified by UnkillableStepProgram will be executed. This

Re: [slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?

2019-10-08 Thread Marcus Boden
Hi Jürgen, you're looking for KillOnBadExit in the slurm.conf: KillOnBadExit If set to 1, a step will be terminated immediately if any task is crashed or aborted, as indicated by a non-zero exit code. With the default value of 0, if one of the processes is crashed or aborted the other proces

[slurm-users] Monitoring with Telegraf

2019-09-26 Thread Marcus Boden
Hey everyone, I am using Telegraf and InfluxDB to monitor our hardware and I'd like to include some slurm metrics into this. Is there already a telegraf plugin for monitoring slurm I don't know about, or do I have to start from scratch? Best, Marcus -- Marcus Vincent Boden, M.Sc. Arbeitsgruppe e

Re: [slurm-users] slurm node weights

2019-09-05 Thread Marcus Boden
Hello Doug, tp quote the slurm.conf page: It would be preferable to allocate smaller memory nodes rather than larger memory nodes if either will satisfy a job's requirements. So I guess the idea is, that if a smaller node satisfies all requirements, why 'waste' a bigger one for it? It makes sense

Re: [slurm-users] Job error when using --job-name=`basename $PWD`

2019-07-28 Thread Marcus Boden
Hi Fabio, are you sure that command substition works in the #SBATCH part of the jobscript? I don't think that slurm actally evaluates that, though I might be wrong. It seems like the #SBATCH after the --job-name line are not evaluated anymore, therefore you can't start srun with two tasks (since

Re: [slurm-users] Hints, Cheatsheets, etc

2019-07-09 Thread Marcus Boden
> > Yeah, on our systems, I get: > Sorry, gawk version 4.0 or later is required. Your version is: GNU Awk > 3.1.7 > (RHEL 6). So this one wasn't as useful for me. But thanks anyway! Just an FYI: Building gawk locally is pretty easy (a simple configure, make, make install), so that might b

Re: [slurm-users] gpu count

2019-06-27 Thread Marcus Boden
Hi, this is usually due to a misconfiguration in your gres.conf (at least it was for me). Can you show your gres.conf? Best, Marcus On 19-06-27 15:33, Valerio Bellizzomi wrote: > hello, my node has 2 gpus so I have specified gres=gpus:2 but the > scontrol show node displays this: > > State=IDLE