Look into the documentation on a QOS on the sacctmgr page. A QOS can be
defined via sacctmgr, and that QOS can be attached to the partition to allow
for more restrictions than just the partition definition allows.
One of the settings for a QOS is "MAXTRESPerJob", so setting that to "cpu=8",
an
I'll note that the SLURM_CONF env var is listed on the sbatch docs page, so is
likely not an override for all slurm commands.
From: Groner, Rob via slurm-users
Sent: Tuesday, January 7, 2025 9:04 AM
To: slurm-users@lists.schedmd.com ; Sven Schulze
Su
The config file location is set during the ./configure step in building the
source code. I think it is --conf-dir or something. Do ./configure --help to
find the correct syntax. After configure, then rebuild and slurm will now look
in that new location.
There is also a SLURM_CONF env var tha
I'm not entirely sure, and I can't vouch for differences in a (relatively)
older version of slurm But I'm pretty sure on our cluster, we have to
specify the GRES in the partition in order for Slurm to treat them as
allocatable resources. On our interactive nodes, we have GPUs but we don't
So, the overall answer is yes. But there's a lot to it. I can't detail
everything we did to get where you're talking, but let me try to hit some high
points. Basically, google anything I'm about to mention to get a fuller
explanation:
*
You can set the "cost" of things per partition via th
Maybe I'm reading it wrong, but your partition sets DefMemPerGPU at 32000 and
the nodes only have 31000 real memory available.
Rob
From: Jörg Striewski via slurm-users
Sent: Wednesday, October 16, 2024 4:05 AM
To: slurm-users@lists.schedmd.com
Subject: [slurm-u
t's imagine we have 3 empty nodes and a 200G/user/node limit. If a user
submit 10 jobs each requesting 100G of memory, there should be 2 jobs running
on each worker and 4 jobs pending.
Guillaume
De: "Groner, Rob" <mailto:rug...@psu.e
't know it offhand. i was thinking maybe
some combination of qos and partition and account limits
Rob
From: Guillaume COCHARD
Sent: Tuesday, September 24, 2024 10:58 AM
To: Groner, Rob
Cc: slurm-users@lists.schedmd.com
Subject: Re: Max TRES per user and
Rob
From: Guillaume COCHARD
Sent: Tuesday, September 24, 2024 10:09 AM
To: Groner, Rob
Cc: slurm-users@lists.schedmd.com
Subject: Re: Max TRES per user and node
Thank you for your answer.
To test it I tried:
sacctmgr update qos normal set maxtresperuser=cpu=2
# Then in slurm
You have the right idea.
On that same page, you'll find MaxTRESPerUser, as a QOS parameter.
You can create a QOS with the restrictions you'd like, and then in the
partition definition, you give it that QOS. The QOS will then apply its
restrictions to any jobs that use that partition.
Rob
When you updated your operating system, you likely updated the version of slurm
you were using too (assuming slurm had been installed from system repos instead
of built source code). Slurm only supports db and state files that are within
2 major versions older than itself.
The fix is to uninst
Since you mentioned "an alternate configuration file", look at the bottom of
the sbatch online docs. It describes a SLURM_CONF env var you can set that
points to the config files.
Rob
____
From: Groner, Rob via slurm-users
Sent: Monday, May 20, 2024
It gets them from the slurm.conf file. So wherever you are executing
srun/sbatch/etc, it should have access to the slurm config files.
From: Alan Stange via slurm-users
Sent: Monday, May 20, 2024 2:55 PM
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] R
FYI, I submitted a bug about this in March because the "compatible" line in the
docs was confusing to me as well. The change coming to the docs removes that
altogether and simply says that setting it to OFF "disables job preemption and
gang scheduling". Much clearer.
And we do it the same way
Marko,
We are running 23.02.6 and have a partition with a specific account set in
AllowAccounts. We test that only that account can use that partition, and it
works. I'll note that you'll need to set EnforcePartLimits=ALL in slurm.conf
for it to work, and if you use the job_submit filter, mak
Ya, I'm kinda looking at exactly this right now as well. For us, I know we're
under-utilizing our hardware currently, but I still want to know if the number
of pending jobs is growing because that would probably point to something going
wrong somewhere. It's a good metric to have.
We are goi
Did you have --with-nvml as part of your configuration? Go back to your
config.log and verify that it ever said it found nvml.h.
If not, then you'll need to make sure you have the right nvidia/cuda packages
installed on the host you're building slurm on, and you might have to specify
--with-nv
Thanks for doing that, as I did not see this original message, and I also am
having to look at configuring our log for rotation. We once accidentally
turned on debug5 and didn't notice until other things started failing because
the drive was full...from that ONE file.
I did find this conversat
It is my understanding that it is a different issue than pmix. So to be fully
protected, you would need to build the latest/fixed pmix and rebuild slurm
using that (or just keep pmix disabled), AND have this latest version of slurm
with their fix for their own vulnerability.
Rob
_
elp you when you are
looking into this.
Sent from my iPhone
On Sep 29, 2023, at 16:10, Groner, Rob wrote:
I'm not looking for a one-time answer. We run these tests anytime we change
anything related to slurmversion, configuration, etc.We certainly run
the test after the syst
han a "hallway comment", that it
sounds like a good thing which I would test with a simulator, if I had one.
I've been intrigued by (but really not looked much into)
https://slurm.schedmd.com/SLUG23/LANL-Batsim-SLUG23.pdf
On Fri, Sep 29, 2023 at 10:05 AM Groner, Rob
mailto:rug.
On our system, for some partitions, we guarantee that a job can run at least an
hour before being preempted by a higher priority job. We use the QOS preempt
exempt time for this, and it appears to be working. But of course, I want to
TEST that it works.
So on a test system, I start a lower pr
nd slurmdbd can all run on different
versions so long as the slurmdbd > slurmctld > slurmd. So if you want to do a
live upgrade you can do it. However out paranoia we general stop everything.
The entire process takes about an hour start to finish, with the longest part
being the pausin
--*O*---
||_// the State | Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
|| \\of NJ | Office of Advanced Research Computing - MSB A555B, Newark
`'
On Sep 28, 2023, at 11:58,
There's 14 steps to upgrading slurm listed on their website, including shutting
down and backing up the database. So far we've only updated slurm during a
downtime, and it's been a major version change, so we've taken all the steps
indicated.
We now want to upgrade from 23.02.4 to 23.02.5.
O
Yes!
Thanks. I'll try to remember it for next time.
There's a builtin slurm command, I can't remember what it is and google is
failing me, that will take a compacted list of nodenames and return their full
names, and I'm PRETTY sure it will do the opposite as well (what you're asking
for).
It's probably sinfo or scontrolmaybe an sutil if tha
Ya, I agree about the invalid argument not being much help.
In times past when I encountered issues like that, I typically tried:
* restart slurmd on the compute node. Watch its log to see what it
complains about. Usually it's about memory.
* Set the configuration of the node to whatev
I didn't see this thread before, so maybe this has already been suggested...
When submitting jobs with sbatch, you could specify a list of partitions to
use, and slurm will send the jobs to the partition with the earliest
start/highest priority first, and if that gets "full" then it will send th
s supported since
version 23.02
On 24/07/2023 23:26, Groner, Rob wrote:
> I've setup a partition THING with AllowAccounts=stuff. I then use
> sacctmgr to create the stuff account and a mystuff account whose parent
> is stuff. My understanding is that this would make mystuff a subacco
I've setup a partition THING with AllowAccounts=stuff. I then use sacctmgr to
create the stuff account and a mystuff account whose parent is stuff. My
understanding is that this would make mystuff a subaccount of stuff.
The description for specifying allowaccount in a partition definition in
I'm not sure I can help with the rest, but the EnforcePartLimits setting will
only reject a job at submission time that exceeds partition limits, not
overall cluster limits. I don't see anything, offhand, in the interactive
partition definition that is exceeded by your request for 4 GB/CPU.
R
At some point when we were experimenting with MIG, I was being entirely
frustrated in getting it to work until I finally removed the autodetect from
gres.conf and explicitly listed the stuff instead. THEN it worked. I think
you can find the list of files that are the device files using nvidia
45.027] mcs: MCSParameters = (null). ondemand set.
>> [2023-07-18T14:59:45.028] _slurm_rpc_reconfigure_controller: completed
>> usec=5898
>> [2023-07-18T14:59:45.952]
>> SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max
That would certainly do it. If you look at the slurmctld log when it comes up,
it will say that it's marking that node as invalid because it has less (0) gres
resources then you say it should have. That's because slurmd on that node will
come up and say "What gres resources??"
For testing pur
A quick test to see if it's a configuration error is to set config_overrides in
your slurm.conf and see if the node then responds to scontrol update.
From: slurm-users on behalf of Brian
Andrus
Sent: Thursday, May 25, 2023 10:54 AM
To: slurm-users@lists.schedmd
What you are describing is definitely doable. We have our system setup
similarly. All nodes are in the "open" partition and "prio" partition, but a
job submitted to the "prio" partition will preempt the open jobs.
I don't see anything clearly wrong with your slurm.conf settings. Ours are
ver
ahead of any of the heavy user’s pending jobs automatically?
From: slurm-users on behalf of "Groner,
Rob"
Reply-To: Slurm User Community List
Date: Wednesday, May 17, 2023 at 1:09 PM
To: "slurm-users@lists.schedmd.com"
Subject: Re: [slurm-users] On the ability of co
its of what the default permissions of a
coordinator can do.
Of course, that still may not work if there are other accounts/partitions/users
with higher priority jobs than User B. Specifically if those jobs can use the
same resources A's jobs are running on.
Brian Andrus
On 5/17/2023 10
hen they will run next.
That will work if you are able to wait for some jobs to finish and you can
'skip the line' for the priority jobs.
If you need to preempt running jobs, that would take a bit more effort to set
up, but is an alternative.
Brian Andrus
On 5/17/2023 6:40 A
I was asked to see if coordinators could do anything in this scenario:
* Within the account that they coordinated, User A submitted 1000s of jobs
and left for the day.
* Within the same account, User B wanted to run a few jobs really quickly.
Once submitted, his jobs were of course behi
I'm trying to puzzle out using QOS-based preemption instead of partition-based
so we can have the juicy prize of PreemptExemptTime. But in the process, I've
encountered something that puzzles ME.
I have 2 partitions that, for the purposes of testing, are identical except for
the QOS they have
nce it has run through its preempexempttime. It never gets
preempted.
Thanks.
Rob
From: slurm-users on behalf of
Christopher Samuel
Sent: Tuesday, March 7, 2023 3:40 PM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] PreemptExemptTime
On 3/7/
h 7, 2023 3:40 PM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] PreemptExemptTime
On 3/7/23 6:46 am, Groner, Rob wrote:
> Over global settings are PreemptMode=SUSPEND,GANG and
> PreemptType=preempt/partition_prio. We have a high priority partition
> that nothing should e
mpt exempt time (unless that comes
from the global setting).
Thanks.
Rob
From: slurm-users on behalf of
Christopher Samuel
Sent: Tuesday, March 7, 2023 3:40 PM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] PreemptExemptTime
On 3/7/23 6:46
I found a thread about this topic that's a year old and at that time seemed to
give no hope, I'm just wondering if the situation has changed. My testing so
far isn't encouraging.
In the thread (here: https://groups.google.com/g/slurm-users/c/yhnSVBoohik) it
talks about wanting to give lower pr
I'm trying to setup some testing of our job_submit.lua plugin so I can verify
that changes I make to it don't break anything.
I looked into luaunit for testing, and that seems like it would do what I
needlet me set the value of inputs, call the slurm_job_submit() function
with them, and the
No, there's no other official documentation of that. The official docs also
say to go to that source file and see the fields there. It's what I do also.
Rob
From: slurm-users on behalf of Chrysovalantis Paschoulas
Sent: Wednesday, February 8, 2023 11:46 AM
To
ularity shell
/opt/shared/singularity/prebuilt/postgresql/13.2.simg
salloc: Granted job allocation 3953723
salloc: Waiting for resource configuration
salloc: Nodes r1n00 are ready for job
Singularity>
On Feb 8, 2023, at 09:47 , Groner, Rob mailto:rug...@psu.edu>>
wrote:
I tried th
I tried that, and it says the nodes have been allocated, but it never comes to
an apptainer prompt.
I then tried doing them in separate steps. Doing salloc works, I get a prompt
on the node that was allocated. I can then run "singularity shell " and
get the apptainer prompt. If I prefix that
r_run.sh
Then cluster_run.sh would call sbatch along with the appropriate commands.
Brian Andrus
On 2/7/2023 9:31 AM, Groner, Rob wrote:
I'm trying to setup the capability where a user can execute:
$: sbatch script_to_run.sh
and the end result is that a job is created on a node, and that
I'm trying to setup the capability where a user can execute:
$: sbatch script_to_run.sh
and the end result is that a job is created on a node, and that job will
execute "singularity exec script_to_run.sh"
Also, that they could execute:
$: salloc
and would end up on a node per their paramet
m User Community List
Subject: Re: [slurm-users] Using oversubscribe to hammer a node
Hi Rob,
"Groner, Rob" writes:
> I'm trying to setup a specific partition where users can fight with the OS
> for dominance, The oversubscribe property sounds like what I want, as it says
I'm trying to setup a specific partition where users can fight with the OS for
dominance, The oversubscribe property sounds like what I want, as it says
"More than one job can execute simultaneously on the same compute resource."
That's exactly what I want. I've setup a node with 48 CPU and o
Generating the *.conf files from parseable/testable sources is an interesting
idea. You mention nodes.conf and partitions.conf. I can't find any
documentation on those. Are you just creating those files and then including
them in slurm.conf?
Rob
From: slurm-
ll running happily.
If it succeeds, it takes control back and you can then restart the secondary
with the new (known good) config.
Brian Andrus
On 1/17/2023 12:36 PM, Groner, Rob wrote:
So, you have two equal sized clusters, one for test and one for production?
Our test cluster is a small
time (after the maintenance
reservation) to ensure that MPI runs correctly.
On Wed, Jan 4, 2023 at 12:26 PM Groner, Rob
mailto:rug...@psu.edu>> wrote:
We currently have a test cluster and a production cluster, all on the same
network. We try things on the test cluster, and then we gath
We currently have a test cluster and a production cluster, all on the same
network. We try things on the test cluster, and then we gather those changes
and make a change to the production cluster. We're doing that through two
different repos, but we'd like to have a single repo to make the tra
just 1 gpu, without them going to pending (until all gpus are used up).
Rob
From: slurm-users on behalf of Groner,
Rob
Sent: Thursday, November 17, 2022 10:08 AM
To: Slurm User Community List
Subject: Re: [slurm-users] NVIDIA MIG question
No, I can't s
emory or cpu),
or some other limit (in the account, partition, or qos)
On our setup we're limiting jobs to 1 gpu per job (via partition qos), however
we can use up all the MIGs with single gpu jobs.
On Wed, 16 Nov 2022 at 23:48, Groner, Rob
mailto:rug...@psu.edu>> wrote:
That does h
a
single job could use all 14 instances. The result you observed suggests that
MIG is a feature of the driver i.e lspci shows one device but nvidia-smi shows
7 devices.
I haven't played around with this myself in slurm but would be interested to
know the answers.
Laurence
On 15/11/2022
We have successfully used the nvidia-smi tool to take the 2 A100's in a node
and split them into multiple GPU devices. In one case, we split the 2 GPUS
into 7 MIG devices each, so 14 in that node total, and in the other case, we
split the 2 GPUs into 2 MIG devices each, so 4 total in the node.
users on behalf of Michael
Lewis
Reply-To: Slurm User Community List
Date: Friday, November 11, 2022 at 10:01 AM
To: Slurm User Community List
Subject: Re: [slurm-users] NVML not found when Slurm was configured.
Thanks Rob! No I just grabbed it through apt. I’ll try that now.
Mike
Fr
Hi Mike,
I can't tell if you're compiling slurm or not on your own. You will have to if
you want the functionality.
On RedHat8, I had to install cuda-nvml-devel-11-7, so find what the equivalent
is for that in Ubuntu. Basically, whatever package includes nvml.h and
libnvidia-ml.so. Then, mo
A very helpful reply, thank you!
For your "special testing config", do you just mean the
slurm.conf/gres.conf/*.conf files? So when you want to test a new version of
slurm, you replace the conf files and then restart all of the daemons?
Rob
I'm really pleased to find the test suite included with slurm, and after some
initial difficulty, I now am able to run the unit tests and expect tests.
The expect tests seem to generally be failing whenever the test involves tasks.
Anything asking for more than 1 task per node is failing.
[202
I've encountered that many times, and for me, it was always related to
AutoDetect and the nvidia-ml library. Does your slurmd log contain a line like
"debug: skipping GRES for NodeName=t-gc-1202 AutoDetect=nvml"? I see that
you didn't specifically set AutoDetect to nvml in gres.conf, but may
y doing such for a fairly limited amount of information
which presumably does not change frequently, perhaps it would be better to have
a cron job periodically output the desired information to a file, and have the
job_submit.lua read the information from the file?
On Tue, Oct 11, 2022 at 5:17 P
I am testing a method where, when a job gets submitted asking for specific
features, then, if those features don't exist, I'll do something.
The job_submit.lua plugin has worked to determine when a job is submitted
asking for the specific features. I'm at the point of checking if those
feature
Have you checked the logs for slurmd and slurmctld? I seem to recall that the
"invalid" state for a node meant that there was some discrepancy between what
the node says or thinks it has (slurmd -C) and what the slurm.conf says it has.
While there is that discrepancy and the node is invalid, y
Thanks. I tried that, and it seems like it may be exactly what I was looking
for.
Rob
I'm trying to setup a system where, when a job from a certain account is
submitted, if no nodes are available that have a specific feature, then the job
will be paused/held/pending and a node will be dynamically created with that
feature.
I can now dynamically bring up the node with the feature
I ended up getting some help, and in the process, I noticed (for the first
time) that the topology plugin was listed in the slurm.conf file. I remembered
that the dynamic nodes docs mentioned that dynamic nodes was not compatible
with the topology plugin. I had previously removed the nodes fro
I tried a simpler test, removing the features altogether so it was just another
node offering 48 CPUs. I then started jobs asking for 24 CPUs a bunch of
times. The jobs started on every node EXCEPT t-gc-1201, and jobs went pending
for resources until the "normal" nodes could return.
So at th
I have 2 nodes that offer a "gc" feature. Node t-gc-1202 is "normal", and node
t-gc-1201 is dynamic. I can successfully remove t-gc-1201 and bring it back
dynamically. Once I bring it back, that node appears JUST LIKE the "normal"
node in the sinfo output, as seen here:
[rug262@testsch (RC)
make you feel slurmd cannot run as a service on a dynamic
node. As long as you added the options to the systemd defaults file for it, you
should be fine (usually /etc/defaults/slurmd)
Brian
On 9/23/2022 7:40 AM, Groner, Rob wrote:
Ya, we're still working out the mechanism for taking the nod
pt outside slurm itself, on the head
node. You can use ssh/pdsh to connect to a node and execute things there while
it is out of the mix.
Brian Andrus
On 9/23/2022 7:09 AM, Groner, Rob wrote:
I'm working through how to use the new dynamic node features in order to take
down a particular
I'm working through how to use the new dynamic node features in order to take
down a particular node, reconfigure it (using nvidia MIG to change the number
of graphic cores available) and give it back to slurm.
I'm at the point where I can take a node out of slurm's control from the master
nod
78 matches
Mail list logo