Perhaps you could provide the exact error message or log output from a
failed attempt
Brian Andrus
On 4/29/2025 7:56 AM, milad--- via slurm-users wrote:
My partitions definition is super simple:
```
PartitionName=t4 Nodes=slurm-t4-[1-30] DEFAULT=YES MaxTime=INFINITE State=UP
DefCpuPerGPU=16
Nodes=slurm-a100-80gb-[1-30] MaxTime=INFINITE State=UP
DefCpuPerGPU=12 DefMemPerGPU=85486
```
I looked at PARTITION CONFIGURATION section of slurm.conf page but don't see
anything that would relate to multiple partitions and/or number of tasks.
--
slurm-users mailing list -- slurm-
Perhaps some of the partition's default (maybe even implicit) are to blame?
On Mon, Apr 28, 2025 at 7:56 AM milad--- via slurm-users <
slurm-users@lists.schedmd.com> wrote:
> Update: I also noticed that specifying -ntasks makes a difference when
> --gpus is present.
>
> i
is
configured to total thread count on multi-threaded hardware and no other
topology info ("Sockets=", "CoresPerSocket", etc.) is configured.
[…]
So, I suppose, in version 23.11.7 SLURM corrected that behaviour. Could someone
confirm that?
Thanks.
--
ntend to achieve with CPUs=... if the host is single-socket?
https://slurm.schedmd.com/gres.conf.html#OPT_Cores has your answer though, I
think.
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
,h100 --gpus h100:1 script.sh
Adding --ntasks: works
✅ sbatch -p a100,h100 --gpus h100:1 --ntasks 1 script.sh
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
groups.google.com/g/slurm-users/c/UOUVfkajUBQ
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
VAL with
reason:gres/gpu GRES core specification 0-1 for node aopcvis5 doesn't match
socket boundaries. (Socket 0 is cores 0-3)
Where is my configuration error?
Thanks.
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
n
job_desc.qos = "gpu_interactive"
end
if job_desc.partition == "cpu" then
job_desc.qos = "cpu_interactive"
end
return slurm.SUCCESS
Thanks
Ewan
-----Original Message-
From: Ole Holm Nielsen via sl
Us" is a limit on the number of CPUs the association
can use.
-- Michael
On Fri, Apr 18, 2025 at 8:01 AM Patrick Begou via slurm-users
wrote:
Hi all,
I'm trying to setup a QoS on a small 5 nodes cluster running
slurm
24.05.7. My goal is to
y one interactive job per user can be running at any
given time.
Cheers,
René
Am 25.04.25 um 11:37 schrieb Ewan Roche via slurm-users:
Hello Ole,
the way I identify interactive jobs is by checking that the script is empty in
job_submit.lua.
If it's the case then they're assigned to an
Hi Ole,
Ole Holm Nielsen via slurm-users
writes:
> We would like to put limits on interactive jobs (started by salloc) so
> that users don't leave unused interactive jobs behind on the cluster
> by mistake.
>
> I can't offhand find any configurations that limit in
job_desc.qos = "gpu_interactive"
end
if job_desc.partition == "cpu" then
job_desc.qos = "cpu_interactive"
end
return slurm.SUCCESS
Thanks
Ewan
-Original Message-
From: Ole Holm Nielsen via slurm-users
r
Department of Physics, Technical University of Denmark
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
Hmm.. Good idea. I'll start looking at that.
Thanks!
Jeff
On Thu, Apr 24, 2025 at 11:02 AM Cutts, Tim via slurm-users <
slurm-users@lists.schedmd.com> wrote:
> I wonder whether there might be core-pinning/NUMA toplogy/hyperthreading
> sort of thing going on here?
>
>
n support you by
visiting our Service
Catalogue<https://azcollaboration.sharepoint.com/sites/CMU993> |
From: Michael DiDomenico via slurm-users
Date: Wednesday, 23 April 2025 at 7:53 pm
To:
Cc: Slurm User Community List
Subject: [slurm-users] Re: Job running slower when using Slurm
the
if you use any Python libs.
> Best,
>
> Feng
>
>
> On Wed, Apr 23, 2025 at 3:22 PM Jeffrey Layton via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
>
>> Roger. It's the code that prints out the threads it sees - I bet it is
>> the cgroups. I ne
Beside slurm options, you might also need to set OpenMP env variable:
export OMP_NUM_THREADS=32 (the core, not thread number)
Also other similar env variables, if you use any Python libs.
Best,
Feng
On Wed, Apr 23, 2025 at 3:22 PM Jeffrey Layton via slurm-users <
slurm-us
Thank you all. I had thought of writing my own, but I suspected it would be
too large of a time sink. Your nudges (and example script) have convinced
me otherwise, and in fact this is what I will do!
Thanks again!
On Tue, Apr 22, 2025 at 3:12 AM Bjørn-Helge Mevik via slurm-users <
slurm-us
n,
this is something in the code. Unfortunately, the code uses the time to
compute Mop/s total and Mop/s/thread so a longer time means slower
performance.
Thanks!
Jeff
On Wed, Apr 23, 2025 at 2:53 PM Michael DiDomenico via slurm-users <
slurm-users@lists.schedmd.com> wrote:
> the progr
the program probably says 32 threads, because it's just looking at the
box, not what slurm cgroups allow (assuming your using them) for cpu
i think for an openmp program (not openmpi) you definitely want the
first command with --cpus-per-task=32
are you measuring the runtime inside the program or
I tried using ntasks and cpus-per-task to get all 32 cores. So I added
--ntasks=# --cpus-per-task=N to th sbatch command so that it now looks
like:
sbatch --nodes=1 --ntasks=1 --cpus-per-task=32
Roger. I didn't configure Slurm so let me look at slurm.conf and gres.conf
to see if they restrict a job to a single CPU.
Thanks
On Wed, Apr 23, 2025 at 1:48 PM Michael DiDomenico via slurm-users <
slurm-users@lists.schedmd.com> wrote:
> without knowing anything about your en
, 2025 at 1:28 PM Jeffrey Layton via slurm-users
wrote:
>
> Good morning,
>
> I'm running an NPB test, bt.C that is OpenMP and built using NV HPC SDK
> (version 25.1). I run it on a compute node by ssh-ing to the node. It runs in
> about 19.6 seconds.
>
> Then I ru
Thanks!
Jeff
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
ts and the issue disappeared for now.
Best regards,
Frank
From: f...@adm.ku.dk
Sent: 14 April 2025 13:37
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] Strange output of sshare
Hi all, I'm trying to clean up and reconfigure fair share on
n a user. Default is the
> cluster's limit. To clear a previously set value use the modify command
> with a new value of -1 for each TRES id.
>
>- sacctmgr(1)
>
> The "MaxCPUs" is a limit on the number of CPUs the association can use.
>
> -- Michael
>
Davide DelVento via slurm-users writes:
> I've gotten a request to have Slurm notify users for the typical email
> things (job started, completed, failed, etc) with a REST API instead of
> email. This would allow notifications in MS Teams, Slack, or log stuff in
> some inte
he cluster's limit. To clear a previously set value use the modify
command with a new value of -1 for each TRES id.
- sacctmgr(1)
The "MaxCPUs" is a limit on the number of CPUs the association can use.
-- Michael
On Fri, Apr 18, 2025 at 8:01 AM Patrick Begou via slurm-us
no goes down for timeout ping
so my question :
can we have an option to force DNS resolution instead ip discover in
Dynamic mode ?
( i try the option cloud_dns,but it not seems the purpose of this option)
best regard,
Stephane
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To uns
on busy environments, but at least, Slurm can do it.
Cheers,
--
Kilian
On Mon, Apr 21, 2025 at 11:46 AM Davide DelVento via slurm-users
wrote:
>
> Happy Monday everybody,
>
> I've gotten a request to have Slurm notify users for the typical email things
> (job started, complet
account[default=bad success=ok user_unknown=ignore] pam_sss.so
So how can we configure this to work around sssd?
On Sat, Apr 19, 2025 at 4:47 AM Ole Holm Nielsen via slurm-users <
slurm-users@lists.schedmd.com> wrote:
> Hi Robert,
>
> The pam_slurm_adopt has worked well and wit
ces, because my gut feeling is that somebody must have already had such
an itch to scratch!
Any other ideas about alternative ways to accomplish this?
Thanks
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
HPC | www.chpc.ac.za | NICIS | nicis.ac.za
Centre for High Performance Computing
If you receive an email from me out of office hours for you, please do not feel
obliged to respond during off-hours!
Book time to meet with
me<https://outlook.office.com/bookwithme/user/87af4906a703488386578f34e4473...@csir.co.za?anonymous&ep=signature>
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
ot;?
/Ole
On 18-04-2025 19:09, Robert Kudyba via slurm-users wrote:
Thanks Ole and Massimo, I definitely do not have UsePAM=1 in slurm.conf.
I commented outpam_systemdhere:
grep pam_systemd *
fingerprint-auth:-session optional pam_systemd.so
fingerprint-auth-ac:-session
> I am asking because I had this problem when I configured the
> pam_slurm_adopt
>
> Cheers, Massimo
>
>
> On Fri, Apr 18, 2025 at 5:28 PM Robert Kudyba via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
>
>> In the instructions for pam_slurm_adopt
&
Hi
Did you disable the pam_systemd.so also from the module files included by
the sshd pam file ?
I am asking because I had this problem when I configured the
pam_slurm_adopt
Cheers, Massimo
On Fri, Apr 18, 2025 at 5:28 PM Robert Kudyba via slurm-users <
slurm-users@lists.schedmd.com>
internet, but that is a bad advice :-(
You mention CentOS, but that OS has been dead for a long time...
IHTH,
Ole
On 18-04-2025 17:26, Robert Kudyba via slurm-users wrote:
In the instructions for pam_slurm_adopt <https://slurm.schedmd.com/
pam_slurm_adopt.html#ssh_config>, there are instru
So I must have missed something ?
My partition (I've only one) in slurm.conf is:
PartitionName=genoa State=UP Default=YES MaxTime=48:00:00
DefaultTime=24:00:00 Shared=YES OverSubscribe=NO Nodes=osorno-0-[0-4]
Thanks
Patrick
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To
sss(sshd:account): Access denied
for user user: 6 (Permission denied)
Apr 18 11:13:41 node11 sshd[33355]: fatal: Access denied for user user by
PAM account configuration [preauth]
Am I missing something?
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
ult is the cluster's
limit. To clear a previously set value use the modify command with a new
value of -1 for each TRES id.
- sacctmgr(1)
The "MaxCPUs" is a limit on the number of CPUs the association can use.
-- Michael
On Fri, Apr 18, 2025 at 8:01 AM Patrick Begou via
Hi Chris!
I didn't modify the cgroup configuration file; I only upgraded the Slurm
version.
After that, the limitations worked successfully.
It's quite odd.
lyz
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
Hiya!
On 16/4/25 12:56 am, lyz--- via slurm-users wrote:
I've tried version 23.11.10. It does work.
Oh that's wonderful, so glad it helped! It did seem quite odd that it
wasn't working for you before then. I wonder if this was a cgroups v1 vs
cgroups v2 thing?
All
d the GPUs using this command, I saw the expected number of
GPUs:
srun -p gpu --gres=gpu:2 -nodelist=node11 --pty nvidia-smi
Thank you very much for your guidance.
Best luck
Lyz
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-use
Hi ! Christ.
The cgroup.conf on my gpu node is as same as head node. The content are as
follow:
CgroupAutomount=yes
ConstrainCores=yes
ConstrainRAMSpace=yes
ConstrainDevices=yes
I'll try slurm of high version.
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe
Hiya,
On 4/15/25 7:03 pm, lyz--- via slurm-users wrote:
Hi, Christ. Thank you for continuing paying attention to this issue.
I followed your instuction. And This is the output:
[root@head1 ~]# systemctl cat slurmd | fgrep Delegate
Delegate=yes
That looks good to me, thanks for sharing that
On 4/15/25 6:57 pm, lyz--- via slurm-users wrote:
Hi, Sean. It's the latest slurm version.
[root@head1 ~]# sinfo --version
slurm 22.05.3
That's quite old (and no longer supported), the oldest still supported
version is 23.11.10 and 24.11.4 came out recently.
What does the cgroup
Hi, Christ. Thank you for continuing paying attention to this issue.
I followed your instuction. And This is the output:
[root@head1 ~]# systemctl cat slurmd | fgrep Delegate
Delegate=yes
lyz
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm
=/dev/nvidia0
Name=gpu File=/dev/nvidia1
Name=gpu File=/dev/nvidia2
Name=gpu File=/dev/nvidia3
Name=gpu File=/dev/nvidia4
Name=gpu File=/dev/nvidia5
Name=gpu File=/dev/nvidia6
Name=gpu File=/dev/nvidia7
# END AUTOGENERATED SECTION -- DO NOT REMOVE
--
slurm-users mailing list -- slurm-
On 4/15/25 12:55 pm, Sean Crosby via slurm-users wrote:
What version of Slurm are you running and what's the contents of your
gres.conf file?
Also what does this say?
systemctl cat slurmd | fgrep Delegate
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
--
slurm-
What version of Slurm are you running and what's the contents of your gres.conf
file?
Sean
From: lyz--- via slurm-users
Sent: Tuesday, April 15, 2025 11:16:40 PM
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] Re: [EXT] Re: Issue with Enforcin
roups, you will be
"confined" to the resources assigned to this "last" job.
Is it possible in some way to specify the job to be mapped, in case there
are multiple jobs for that user on the same node ?
Thanks, Massimo
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
T
e the restriction applies to the physical GPU hardware, but it
doesn't take effect for CUDA.
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
You need to add
ConstrainDevices=yes
To your cgroup.conf and restart slurmd on your nodes. This is the setting which
gives access to only the GRES you request in the jobs
Sean
From: lyz--- via slurm-users
Sent: Tuesday, April 15, 2025 8:29:41 PM
To: slurm
00)
if __name__ == "__main__":
test_gpu()
When I run this script, it still bypasses the resource restrictions set by
cgroup.
Are there any other ways to solve this problem?
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
On 4/14/25 6:27 am, lyz--- via slurm-users wrote:
This command is intended to limit user 'lyz' to using a maximum of 2 GPUs. However, when the user
submits a job using srun, specifying CUDA 0, 1, 2, and 3 in the job script, or
os.environ["CUDA_VISIBLE_DEVICES"] = &quo
or every user.
I'm grateful for any pointer for what to look for.
Best regards,
Frank
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
eing enforced as expected. How can I resolve this situation.
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
-overlap --jobid JOBIDNUM bash
>
>
>
> -- Paul Raines (http://help.nmr.mgh.harvard.edu)
>
>
>
> On Mon, 14 Apr 2025 4:30am, Massimo Sgaravatto via slurm-users wrote:
>
> >External Email - Use Caution
> >
> > Dear all
> >
> > With the pam_sl
Instead of using pam_slurm_adopt your users can get a shell on the node
of a specific job in that job's "mapped" space by running
srun --pty --overlap --jobid JOBIDNUM bash
-- Paul Raines (http://help.nmr.mgh.harvard.edu)
On Mon, 14 Apr 2025 4:30am, Massimo Sgaravatto
8,
"mode": "backup"
}
Other commands fail with:
"error_number": 1007,
"error": "Protocol authentication error",
I'll admit, I don't usually use sockets, so I could easily be
overlooking something there. Permissions on the socket look right. I am
getting json back, so it is connecting. Note: slurmrestd is running
under it's own user (not root and not slurmuser).
Any ideas?
Thanks in advance,
Brian Andrus
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
Director, R&D IT - Data, Analytics & AI, Scientific Computing Platform
AstraZeneca
Find out more about R&D IT Data, Analytics & AI and how we can support you by
visiting our Service
Catalogue<https://azcollaboration.sharepoint.com/sites/CMU993> |
From: Loris Bennett via slurm-
and Evolution
Deutscher Platz 6
04103 Leipzig
Room: U2.80
E-Mail: pierre_ab...@eva.mpg.de
Phone: +49 (0) 341 3550 245
smime.p7s
Description: S/MIME Cryptographic Signature
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le
know why, but this is OK.
Thanks,
Hiro
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
ch cluster
[root@vuwunicohpcdbp1 admjonesst1]#
regards
Steven
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
Lik duh
ty
regards
Steven Jones
B.Eng (Hons)
Technical Specialist - Linux RHCE
Victoria University, Digital Solutions,
Level 8 Rankin Brown Building,
Wellington, NZ
6012
0064 4 463 6272
From: Christopher Samuel via slurm-users
Sent: Thursday, 10
Hi Steven,
On 4/9/25 5:00 pm, Steven Jones via slurm-users wrote:
Apr 10 10:28:52 vuwunicohpcdbp1.ods.vuw.ac.nz slurmdbd[2413]: slurmdbd:
fatal: This host not configured to run SlurmDBD ((vuwunicohpcdbp1 or
vuwunicohp>
^^^ that's the critical error message, and it's reporting
pam_access.so
session required pam_unix.so
session optional pam_systemd.so
Accounting seems to be working OK.
Does anyone know what PAM related code paths could be triggered by
enabling accounting?
d
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email
.tar.gz). Does anyone knows
whether this is the best choice? Many thanks in advance for any advice.
Cheers,
-Frank
smime.p7s
Description: S/MIME cryptographic signature
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le
On 09-04-2025 18:23, Daniel Letai via slurm-users wrote:
Although 1.6.15 is latest and greatest, there is already a patch
https://lists.gnu.org/archive/html/freeipmi-devel/2025-02/msg0.html
for an issue that was severe enough to fail to build on fedora42
https://bugzilla.redhat.com
Hello Ole, Hell Daniel,
Many thanks for the quick reply and the information. That was what I was
looking for. Thanks a lot.
Cheers,
-Frank
From: Daniel Letai via slurm-users
Sent: Wednesday, 9 April 2025 18:24
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] Re: recommended
already in the new srpm in koji, so I would simply download that
and see if can be built for rh
https://koji.fedoraproject.org/koji/buildinfo?buildID=2674703
On 09/04/2025 18:43, Ole Holm Nielsen
via slurm-users wrote:
Hi
Frank
/Slurm_configuration/#ipmi-power-monitoring
We have used the FreeIPMI plugin for a long time, and it works just great!
We just upgraded to Slurm 24.11.4 today :-)
On 4/9/25 17:19, Heckes, Frank via slurm-users wrote:
I’d like to update to SLURM version 24.11.4. I was searching for a
recommendation for freeIPMI
urmUser=slurm
StorageType=accounting_storage/mysql
StorageHost=gateway1
StoragePass=mypassword
StorageUser=slurm
Best regards,
Hiro
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
e-cases of jobs incorrectly pending held when --prefer
features are not initially satisfied.
-- slurmctld - Fix jobs incorrectly held when --prefer not satisfied in some
use-cases.
-- Ensure RestrictedCoresPerGPU and CoreSpecCount don't overlap.
--
slurm-users mailing list -- slurm-us
On 4/6/25 22:56, Oren via slurm-users wrote:
Hi,
We set up a slurm system with email notification, this is the slurm.conf
`MailProg=/usr/sbin/sendmail`
But the email that I get has not status, just an empty message:
image.png
no subject, no info, what are we missing?
The image says that
variable that controls whether
srun is inside the same job or not. Unsetting SLURM_CPU_BIND is needed to
avoid "CPU binding outside of job step allocation".
Cheers
On Sat, Apr 5, 2025 at 3:39 PM Chris Samuel via slurm-users <
slurm-users@lists.schedmd.com> wrote:
> On 4/4/
Hi,
We set up a slurm system with email notification, this is the slurm.conf
`MailProg=/usr/sbin/sendmail`
But the email that I get has not status, just an empty message:
[image: image.png]
no subject, no info, what are we missing?
Thanks~
--
slurm-users mailing list -- slurm-users
Computing - MSB A555B, Newark
`'
On Apr 1, 2025, at 12:41, Patrick Begou via slurm-users
wrote:
Hi slurm team,
I would ask some clarifications with slurm releases. Why two versions of slurm
are available ?
I speak of 24.05.7 versus 24.11.3 on
https://www.schedmd.com/slurm-support/re
de) because it would mean having 3 partition
(if I have got it right): two partitions for cpu only jobs and 1 partition
for gpu jobs
Many thanks, Massimo
[*] https://groups.google.com/g/slurm-users/c/IUd7jLKME3M
[**] https://groups.google.com/g/slurm-users/c/o7AiYAQ1YJ0
--
slurm-users mailing
mp; Biases but that is code
specific: https://wandb.ai/site/ You can also use scontrol -d show job
to print out the layout of a job including which specific GPU's were
assigned.
-Paul Edmon-
On 4/2/25 9:17 AM, Jason Simms via slurm-users wrote:
Hello all,
Apologies for the basic
for users it is not working as the UID/GID differ.
Is there a way we can overcome this issue? and be able to run jobs.
Rehards
Navin
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
ady give me a
hint where to look.
Thanks a lot
Matthias
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
On 4/4/25 5:23 am, Michael Milton via slurm-users wrote:
Plain srun re-uses the existing Slurm allocation, and specifying
resources like --mem will just request then from the current job rather
than submitting a new one
srun does that as it sees all the various SLURM_* environment variables
> Looking at the srun man page, I could speculate that --clusters
> or --cluster-constraint might help in that regard (but I am not sure).
>
> Have a nice weekend
>
>
> On Fri, Apr 4, 2025 at 6:27 AM Michael Milton via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
&g
d/or time limits)
--
Tim Cutts
Senior Director, R&D IT - Data, Analytics & AI, Scientific Computing Platform
AstraZeneca
Find out more about R&D IT Data, Analytics & AI and how we can support you by
visiting our Service
Catalogue<https://azcollaboration.sharepoint.com/sites/CM
Hello David,
thank you, this might be a simple and a viable solution to this problem. I'll
test both
(yours and Megan) solutions and then decide.
Kind regards
--
On Sun, Mar 30, 2025 at 08:19:12AM -0600, Davide DelVento via slurm-users wrote:
Hi Kamil,
I don't use QoS, so I do
there may be something I overlooked.
On Mon, Mar 31, 2025 at 5:12 AM Massimo Sgaravatto via slurm-users <
slurm-users@lists.schedmd.com> wrote:
> Dear all
>
>
>
> We have just installed a small SLURM cluster composed of 12 nodes:
>
> - 6 CPU only nodes
un re-uses the existing Slurm allocation, and specifying
resources like --mem will just request then from the current job rather
than submitting a new one
What is the best solution here?
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
Hi Tim,
"Cutts, Tim via slurm-users"
writes:
> You can set a partition QoS which specifies a minimum. We have such a qos on
> our large-gpu partition; we don’t want people scheduling small stuff to it,
> so we
> have this qos:
How does this affect total throughput? P
7;s prerequisites are listed in the README.md file in [2],
namely the "gpustat" and "ClusterShell" tools.
Best regards,
Ole
[1] https://github.com/OleHolmNielsen/Slurm_tools/tree/master/pestat
[2] https://github.com/OleHolmNielsen/Slurm_tools/tree/master/jobs
On 4/2/25 15:1
d or what is now possible as a
result.
Warmest regards,
Jason
--
*Jason L. Simms, Ph.D., M.P.H.*
Research Computing Manager
Swarthmore College
Information Technology Services
(610) 328-8102
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
rom bb-local
(try to add '--skip-broken' to skip uninstallable packages or
'--nobest' to use not only best candidate packages)
I think I will try to build 24.05.7 or 24.05.3 as a next try but I'm
interested in any advices.
Thank you
Patrick
--
slurm-users mailing
puNN and cpusingpuNN are physically the same node and whatever1 +
>> whatever2 is the actual maximum amount of memory you want Slurm to
>> allocate. And you will also want to make sure the Weight are such that the
>> non-GPU nodes get used first.
>>
>> Disclaimer: I
could submit to both the cpu
and the requeue partition (as slurm permits multipartition submissions)
and then the gpu partition won't be blocked by anything and you can farm
the space gpu cycles. This works well for our needs.
-Paul Edmon-
On 3/31/2025 9:39 AM, Paul Raines via slurm-u
DelVento via slurm-users wrote:
External Email - Use Caution
Ciao Massimo,
How about creating another queue cpus_in_the_gpu_nodes (or something less
silly) which targets the GPU nodes but does not allow the allocation of the
GPUs with gres and allocates 96-8 (or whatever other number you
e sure the Weight are such that the
> non-GPU nodes get used first.
>
> Disclaimer: I'm thinking out loud, I have not tested this in practice,
> there may be something I overlooked.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Mar
option to each dict and then
updating all QoS individually, and yours solution certainly helps with the
latter.
Kind regards
--
On Mon, Mar 31, 2025 at 12:08:49AM +, megan4slurm--- via slurm-users wrote:
Hi Kamil,
It is possible to set all QOS's "Preempt" value with two sacctmg
ster set Preempt=+low
> Modified qos...
> normal
> high
> Would you like to commit changes? (You have 30 seconds to decide)
> (N/y): y
> $ sacctmgr show qos format=name,preempt,preemptmode
> NamePreempt PreemptMode
> -- -- ---
>
penpgp.org/]
[D415917E84B8DA5A60E853B6E676ED061316B69B]
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
1 - 100 of 1456 matches
Mail list logo