[slurm-users] Re: Can't specify multiple partitions when submitting GPU jobs

2025-04-29 Thread Brian Andrus via slurm-users
Perhaps you could provide the exact error message or log output from a failed attempt Brian Andrus On 4/29/2025 7:56 AM, milad--- via slurm-users wrote: My partitions definition is super simple: ``` PartitionName=t4 Nodes=slurm-t4-[1-30] DEFAULT=YES MaxTime=INFINITE State=UP DefCpuPerGPU=16

[slurm-users] Re: Can't specify multiple partitions when submitting GPU jobs

2025-04-29 Thread milad--- via slurm-users
Nodes=slurm-a100-80gb-[1-30] MaxTime=INFINITE State=UP DefCpuPerGPU=12 DefMemPerGPU=85486 ``` I looked at PARTITION CONFIGURATION section of slurm.conf page but don't see anything that would relate to multiple partitions and/or number of tasks. -- slurm-users mailing list -- slurm-

[slurm-users] Re: Can't specify multiple partitions when submitting GPU jobs

2025-04-29 Thread Davide DelVento via slurm-users
Perhaps some of the partition's default (maybe even implicit) are to blame? On Mon, Apr 28, 2025 at 7:56 AM milad--- via slurm-users < slurm-users@lists.schedmd.com> wrote: > Update: I also noticed that specifying -ntasks makes a difference when > --gpus is present. > > i

[slurm-users] Re: Error "_check_core_range_matches_sock" when

2025-04-29 Thread Gestió Servidors via slurm-users
is configured to total thread count on multi-threaded hardware and no other topology info ("Sockets=", "CoresPerSocket", etc.) is configured. […] So, I suppose, in version 23.11.7 SLURM corrected that behaviour. Could someone confirm that? Thanks. --

[slurm-users] Re: Error "_check_core_range_matches_sock" when starting "slurmctld"

2025-04-28 Thread Laura Hild via slurm-users
ntend to achieve with CPUs=... if the host is single-socket? https://slurm.schedmd.com/gres.conf.html#OPT_Cores has your answer though, I think. -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Can't specify multiple partitions when submitting GPU jobs

2025-04-28 Thread milad--- via slurm-users
,h100 --gpus h100:1 script.sh Adding --ntasks: works ✅ sbatch -p a100,h100 --gpus h100:1 --ntasks 1 script.sh -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Can't specify multiple partitions when submitting GPU jobs

2025-04-28 Thread milad--- via slurm-users
groups.google.com/g/slurm-users/c/UOUVfkajUBQ -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Error "_check_core_range_matches_sock" when starting "slurmctld"

2025-04-28 Thread Gestió Servidors via slurm-users
VAL with reason:gres/gpu GRES core specification 0-1 for node aopcvis5 doesn't match socket boundaries. (Socket 0 is cores 0-3) Where is my configuration error? Thanks. -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: How can we put limits on interactive jobs?

2025-04-25 Thread Ole Holm Nielsen via slurm-users
n job_desc.qos = "gpu_interactive" end if job_desc.partition == "cpu" then job_desc.qos = "cpu_interactive" end return slurm.SUCCESS Thanks Ewan -----Original Message- From: Ole Holm Nielsen via sl

[slurm-users] Re: Setting QoS with slurm 24.05.7

2025-04-25 Thread Patrick Begou via slurm-users
Us" is a limit on the number of CPUs the association can use.  -- Michael On Fri, Apr 18, 2025 at 8:01 AM Patrick Begou via slurm-users wrote: Hi all, I'm trying to setup a QoS on a small 5 nodes cluster running slurm 24.05.7. My goal is to

[slurm-users] Re: How can we put limits on interactive jobs?

2025-04-25 Thread René Sitt via slurm-users
y one interactive job per user  can be running at any given time. Cheers, René Am 25.04.25 um 11:37 schrieb Ewan Roche via slurm-users: Hello Ole, the way I identify interactive jobs is by checking that the script is empty in job_submit.lua. If it's the case then they're assigned to an

[slurm-users] Re: How can we put limits on interactive jobs?

2025-04-25 Thread Loris Bennett via slurm-users
Hi Ole, Ole Holm Nielsen via slurm-users writes: > We would like to put limits on interactive jobs (started by salloc) so > that users don't leave unused interactive jobs behind on the cluster > by mistake. > > I can't offhand find any configurations that limit in

[slurm-users] Re: How can we put limits on interactive jobs?

2025-04-25 Thread Ewan Roche via slurm-users
job_desc.qos = "gpu_interactive" end if job_desc.partition == "cpu" then job_desc.qos = "cpu_interactive" end return slurm.SUCCESS Thanks Ewan -Original Message- From: Ole Holm Nielsen via slurm-users

[slurm-users] How can we put limits on interactive jobs?

2025-04-25 Thread Ole Holm Nielsen via slurm-users
r Department of Physics, Technical University of Denmark -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Job running slower when using Slurm

2025-04-24 Thread Jeffrey Layton via slurm-users
Hmm.. Good idea. I'll start looking at that. Thanks! Jeff On Thu, Apr 24, 2025 at 11:02 AM Cutts, Tim via slurm-users < slurm-users@lists.schedmd.com> wrote: > I wonder whether there might be core-pinning/NUMA toplogy/hyperthreading > sort of thing going on here? > >

[slurm-users] Re: Job running slower when using Slurm

2025-04-24 Thread Cutts, Tim via slurm-users
n support you by visiting our Service Catalogue<https://azcollaboration.sharepoint.com/sites/CMU993> | From: Michael DiDomenico via slurm-users Date: Wednesday, 23 April 2025 at 7:53 pm To: Cc: Slurm User Community List Subject: [slurm-users] Re: Job running slower when using Slurm the

[slurm-users] Re: Job running slower when using Slurm

2025-04-24 Thread Jeffrey Layton via slurm-users
if you use any Python libs. > Best, > > Feng > > > On Wed, Apr 23, 2025 at 3:22 PM Jeffrey Layton via slurm-users < > slurm-users@lists.schedmd.com> wrote: > >> Roger. It's the code that prints out the threads it sees - I bet it is >> the cgroups. I ne

[slurm-users] Re: Job running slower when using Slurm

2025-04-23 Thread Feng Zhang via slurm-users
Beside slurm options, you might also need to set OpenMP env variable: export OMP_NUM_THREADS=32 (the core, not thread number) Also other similar env variables, if you use any Python libs. Best, Feng On Wed, Apr 23, 2025 at 3:22 PM Jeffrey Layton via slurm-users < slurm-us

[slurm-users] Re: Slurm webhooks

2025-04-23 Thread Davide DelVento via slurm-users
Thank you all. I had thought of writing my own, but I suspected it would be too large of a time sink. Your nudges (and example script) have convinced me otherwise, and in fact this is what I will do! Thanks again! On Tue, Apr 22, 2025 at 3:12 AM Bjørn-Helge Mevik via slurm-users < slurm-us

[slurm-users] Re: Job running slower when using Slurm

2025-04-23 Thread Jeffrey Layton via slurm-users
n, this is something in the code. Unfortunately, the code uses the time to compute Mop/s total and Mop/s/thread so a longer time means slower performance. Thanks! Jeff On Wed, Apr 23, 2025 at 2:53 PM Michael DiDomenico via slurm-users < slurm-users@lists.schedmd.com> wrote: > the progr

[slurm-users] Re: Job running slower when using Slurm

2025-04-23 Thread Michael DiDomenico via slurm-users
the program probably says 32 threads, because it's just looking at the box, not what slurm cgroups allow (assuming your using them) for cpu i think for an openmp program (not openmpi) you definitely want the first command with --cpus-per-task=32 are you measuring the runtime inside the program or

[slurm-users] Re: Job running slower when using Slurm

2025-04-23 Thread Jeffrey Layton via slurm-users
I tried using ntasks and cpus-per-task to get all 32 cores. So I added --ntasks=# --cpus-per-task=N to th sbatch command so that it now looks like: sbatch --nodes=1 --ntasks=1 --cpus-per-task=32

[slurm-users] Re: Job running slower when using Slurm

2025-04-23 Thread Jeffrey Layton via slurm-users
Roger. I didn't configure Slurm so let me look at slurm.conf and gres.conf to see if they restrict a job to a single CPU. Thanks On Wed, Apr 23, 2025 at 1:48 PM Michael DiDomenico via slurm-users < slurm-users@lists.schedmd.com> wrote: > without knowing anything about your en

[slurm-users] Re: Job running slower when using Slurm

2025-04-23 Thread Michael DiDomenico via slurm-users
, 2025 at 1:28 PM Jeffrey Layton via slurm-users wrote: > > Good morning, > > I'm running an NPB test, bt.C that is OpenMP and built using NV HPC SDK > (version 25.1). I run it on a compute node by ssh-ing to the node. It runs in > about 19.6 seconds. > > Then I ru

[slurm-users] Job running slower when using Slurm

2025-04-23 Thread Jeffrey Layton via slurm-users
Thanks! Jeff -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Strange output of sshare

2025-04-23 Thread Frank Schilder via slurm-users
ts and the issue disappeared for now. Best regards, Frank From: f...@adm.ku.dk Sent: 14 April 2025 13:37 To: slurm-users@lists.schedmd.com Subject: [slurm-users] Strange output of sshare Hi all, I'm trying to clean up and reconfigure fair share on

[slurm-users] Re: Setting QoS with slurm 24.05.7

2025-04-22 Thread Michael Gutteridge via slurm-users
n a user. Default is the > cluster's limit. To clear a previously set value use the modify command > with a new value of -1 for each TRES id. > >- sacctmgr(1) > > The "MaxCPUs" is a limit on the number of CPUs the association can use. > > -- Michael >

[slurm-users] Re: Slurm webhooks

2025-04-22 Thread Bjørn-Helge Mevik via slurm-users
Davide DelVento via slurm-users writes: > I've gotten a request to have Slurm notify users for the typical email > things (job started, completed, failed, etc) with a REST API instead of > email. This would allow notifications in MS Teams, Slack, or log stuff in > some inte

[slurm-users] Re: Setting QoS with slurm 24.05.7

2025-04-22 Thread Patrick Begou via slurm-users
he cluster's limit. To clear a previously set value use the modify command with a new value of -1 for each TRES id.    - sacctmgr(1) The "MaxCPUs" is a limit on the number of CPUs the association can use.  -- Michael On Fri, Apr 18, 2025 at 8:01 AM Patrick Begou via slurm-us

[slurm-users] problem with slurmd dynamic mode with slurmctld in docker

2025-04-22 Thread stef53864 via slurm-users
no goes down for timeout ping so my question : can we have an option to force DNS resolution instead ip discover in Dynamic mode ? ( i try the option cloud_dns,but it not seems the purpose of this option) best regard, Stephane -- slurm-users mailing list -- slurm-users@lists.schedmd.com To uns

[slurm-users] Re: Slurm webhooks

2025-04-21 Thread Kilian Cavalotti via slurm-users
on busy environments, but at least, Slurm can do it. Cheers, -- Kilian On Mon, Apr 21, 2025 at 11:46 AM Davide DelVento via slurm-users wrote: > > Happy Monday everybody, > > I've gotten a request to have Slurm notify users for the typical email things > (job started, complet

[slurm-users] Re: pam_slurm_adopt ambiguity in instructions and with ssd

2025-04-21 Thread Robert Kudyba via slurm-users
account[default=bad success=ok user_unknown=ignore] pam_sss.so So how can we configure this to work around sssd? On Sat, Apr 19, 2025 at 4:47 AM Ole Holm Nielsen via slurm-users < slurm-users@lists.schedmd.com> wrote: > Hi Robert, > > The pam_slurm_adopt has worked well and wit

[slurm-users] Slurm webhooks

2025-04-21 Thread Davide DelVento via slurm-users
ces, because my gut feeling is that somebody must have already had such an itch to scratch! Any other ideas about alternative ways to accomplish this? Thanks -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] knl_generic.conf seems to be ignored and Slurm adopts default settings which specify Intel motherboard and syscfg - need help to figure out why Slurm is not reading my Dell knl_generic.c

2025-04-19 Thread Bryan Johnston via slurm-users
HPC | www.chpc.ac.za | NICIS | nicis.ac.za Centre for High Performance Computing If you receive an email from me out of office hours for you, please do not feel obliged to respond during off-hours! Book time to meet with me<https://outlook.office.com/bookwithme/user/87af4906a703488386578f34e4473...@csir.co.za?anonymous&ep=signature> -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: pam_slurm_adopt ambiguity in instructions and with ssd

2025-04-19 Thread Ole Holm Nielsen via slurm-users
ot;? /Ole On 18-04-2025 19:09, Robert Kudyba via slurm-users wrote: Thanks Ole and Massimo, I definitely do not have UsePAM=1 in slurm.conf. I commented outpam_systemdhere: grep pam_systemd * fingerprint-auth:-session     optional      pam_systemd.so fingerprint-auth-ac:-session  

[slurm-users] Re: pam_slurm_adopt ambiguity in instructions and with ssd

2025-04-18 Thread Robert Kudyba via slurm-users
> I am asking because I had this problem when I configured the > pam_slurm_adopt > > Cheers, Massimo > > > On Fri, Apr 18, 2025 at 5:28 PM Robert Kudyba via slurm-users < > slurm-users@lists.schedmd.com> wrote: > >> In the instructions for pam_slurm_adopt &

[slurm-users] Re: pam_slurm_adopt ambiguity in instructions and with ssd

2025-04-18 Thread Massimo Sgaravatto via slurm-users
Hi Did you disable the pam_systemd.so also from the module files included by the sshd pam file ? I am asking because I had this problem when I configured the pam_slurm_adopt Cheers, Massimo On Fri, Apr 18, 2025 at 5:28 PM Robert Kudyba via slurm-users < slurm-users@lists.schedmd.com>

[slurm-users] Re: pam_slurm_adopt ambiguity in instructions and with ssd

2025-04-18 Thread Ole Holm Nielsen via slurm-users
internet, but that is a bad advice :-( You mention CentOS, but that OS has been dead for a long time... IHTH, Ole On 18-04-2025 17:26, Robert Kudyba via slurm-users wrote: In the instructions for pam_slurm_adopt <https://slurm.schedmd.com/ pam_slurm_adopt.html#ssh_config>, there are instru

[slurm-users] Setting QoS with slurm 24.05.7

2025-04-18 Thread Patrick Begou via slurm-users
So I must have missed something ? My partition (I've only one) in slurm.conf is: PartitionName=genoa  State=UP Default=YES MaxTime=48:00:00 DefaultTime=24:00:00 Shared=YES OverSubscribe=NO Nodes=osorno-0-[0-4] Thanks Patrick -- slurm-users mailing list -- slurm-users@lists.schedmd.com To

[slurm-users] pam_slurm_adopt ambiguity in instructions and with ssd

2025-04-18 Thread Robert Kudyba via slurm-users
sss(sshd:account): Access denied for user user: 6 (Permission denied) Apr 18 11:13:41 node11 sshd[33355]: fatal: Access denied for user user by PAM account configuration [preauth] Am I missing something? -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Setting QoS with slurm 24.05.7

2025-04-18 Thread Michael Gutteridge via slurm-users
ult is the cluster's limit. To clear a previously set value use the modify command with a new value of -1 for each TRES id. - sacctmgr(1) The "MaxCPUs" is a limit on the number of CPUs the association can use. -- Michael On Fri, Apr 18, 2025 at 8:01 AM Patrick Begou via

[slurm-users] Re: [EXT] Re: Issue with Enforcing GPU Usage Limits in Slurm

2025-04-16 Thread lyz--- via slurm-users
Hi Chris! I didn't modify the cgroup configuration file; I only upgraded the Slurm version. After that, the limitations worked successfully. It's quite odd. lyz -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: [EXT] Re: Issue with Enforcing GPU Usage Limits in Slurm

2025-04-16 Thread Chris Samuel via slurm-users
Hiya! On 16/4/25 12:56 am, lyz--- via slurm-users wrote: I've tried version 23.11.10. It does work. Oh that's wonderful, so glad it helped! It did seem quite odd that it wasn't working for you before then. I wonder if this was a cgroups v1 vs cgroups v2 thing? All

[slurm-users] Re: [EXT] Re: Issue with Enforcing GPU Usage Limits in Slurm

2025-04-16 Thread lyz--- via slurm-users
d the GPUs using this command, I saw the expected number of GPUs: srun -p gpu --gres=gpu:2 -nodelist=node11 --pty nvidia-smi Thank you very much for your guidance. Best luck Lyz -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-use

[slurm-users] Re: [EXT] Re: Issue with Enforcing GPU Usage Limits in Slurm

2025-04-15 Thread lyz--- via slurm-users
Hi ! Christ. The cgroup.conf on my gpu node is as same as head node. The content are as follow: CgroupAutomount=yes ConstrainCores=yes ConstrainRAMSpace=yes ConstrainDevices=yes I'll try slurm of high version. -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe

[slurm-users] Re: [EXT] Re: Issue with Enforcing GPU Usage Limits in Slurm

2025-04-15 Thread Christopher Samuel via slurm-users
Hiya, On 4/15/25 7:03 pm, lyz--- via slurm-users wrote: Hi, Christ. Thank you for continuing paying attention to this issue. I followed your instuction. And This is the output: [root@head1 ~]# systemctl cat slurmd | fgrep Delegate Delegate=yes That looks good to me, thanks for sharing that

[slurm-users] Re: [EXT] Re: Issue with Enforcing GPU Usage Limits in Slurm

2025-04-15 Thread Christopher Samuel via slurm-users
On 4/15/25 6:57 pm, lyz--- via slurm-users wrote: Hi, Sean. It's the latest slurm version. [root@head1 ~]# sinfo --version slurm 22.05.3 That's quite old (and no longer supported), the oldest still supported version is 23.11.10 and 24.11.4 came out recently. What does the cgroup

[slurm-users] Re: [EXT] Re: Issue with Enforcing GPU Usage Limits in Slurm

2025-04-15 Thread lyz--- via slurm-users
Hi, Christ. Thank you for continuing paying attention to this issue. I followed your instuction. And This is the output: [root@head1 ~]# systemctl cat slurmd | fgrep Delegate Delegate=yes lyz -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm

[slurm-users] Re: [EXT] Re: Issue with Enforcing GPU Usage Limits in Slurm

2025-04-15 Thread lyz--- via slurm-users
=/dev/nvidia0 Name=gpu File=/dev/nvidia1 Name=gpu File=/dev/nvidia2 Name=gpu File=/dev/nvidia3 Name=gpu File=/dev/nvidia4 Name=gpu File=/dev/nvidia5 Name=gpu File=/dev/nvidia6 Name=gpu File=/dev/nvidia7 # END AUTOGENERATED SECTION -- DO NOT REMOVE -- slurm-users mailing list -- slurm-

[slurm-users] Re: [EXT] Re: Issue with Enforcing GPU Usage Limits in Slurm

2025-04-15 Thread Christopher Samuel via slurm-users
On 4/15/25 12:55 pm, Sean Crosby via slurm-users wrote: What version of Slurm are you running and what's the contents of your gres.conf file? Also what does this say? systemctl cat slurmd | fgrep Delegate -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA -- slurm-

[slurm-users] Re: [EXT] Re: Issue with Enforcing GPU Usage Limits in Slurm

2025-04-15 Thread Sean Crosby via slurm-users
What version of Slurm are you running and what's the contents of your gres.conf file? Sean From: lyz--- via slurm-users Sent: Tuesday, April 15, 2025 11:16:40 PM To: slurm-users@lists.schedmd.com Subject: [slurm-users] Re: [EXT] Re: Issue with Enforcin

[slurm-users] pam_slurm_adopt and multiple jobs on the same worker node

2025-04-15 Thread Massimo Sgaravatto via slurm-users
roups, you will be "confined" to the resources assigned to this "last" job. Is it possible in some way to specify the job to be mapped, in case there are multiple jobs for that user on the same node ? Thanks, Massimo -- slurm-users mailing list -- slurm-users@lists.schedmd.com T

[slurm-users] Re: [EXT] Re: Issue with Enforcing GPU Usage Limits in Slurm

2025-04-15 Thread lyz--- via slurm-users
e the restriction applies to the physical GPU hardware, but it doesn't take effect for CUDA. -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: [EXT] Re: Issue with Enforcing GPU Usage Limits in Slurm

2025-04-15 Thread Sean Crosby via slurm-users
You need to add ConstrainDevices=yes To your cgroup.conf and restart slurmd on your nodes. This is the setting which gives access to only the GRES you request in the jobs Sean From: lyz--- via slurm-users Sent: Tuesday, April 15, 2025 8:29:41 PM To: slurm

[slurm-users] Re: Issue with Enforcing GPU Usage Limits in Slurm

2025-04-15 Thread lyz--- via slurm-users
00) if __name__ == "__main__": test_gpu() When I run this script, it still bypasses the resource restrictions set by cgroup. Are there any other ways to solve this problem? -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Issue with Enforcing GPU Usage Limits in Slurm

2025-04-14 Thread Christopher Samuel via slurm-users
On 4/14/25 6:27 am, lyz--- via slurm-users wrote: This command is intended to limit user 'lyz' to using a maximum of 2 GPUs. However, when the user submits a job using srun, specifying CUDA 0, 1, 2, and 3 in the job script, or os.environ["CUDA_VISIBLE_DEVICES"] = &quo

[slurm-users] Strange output of sshare

2025-04-14 Thread frsc--- via slurm-users
or every user. I'm grateful for any pointer for what to look for. Best regards, Frank -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Issue with Enforcing GPU Usage Limits in Slurm

2025-04-14 Thread lyz--- via slurm-users
eing enforced as expected.​ How can I resolve this situation. -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: pam_slurm_adopt and multiple jobs on the same worker node

2025-04-14 Thread Massimo Sgaravatto via slurm-users
-overlap --jobid JOBIDNUM bash > > > > -- Paul Raines (http://help.nmr.mgh.harvard.edu) > > > > On Mon, 14 Apr 2025 4:30am, Massimo Sgaravatto via slurm-users wrote: > > >External Email - Use Caution > > > > Dear all > > > > With the pam_sl

[slurm-users] Re: pam_slurm_adopt and multiple jobs on the same worker node

2025-04-14 Thread Paul Raines via slurm-users
Instead of using pam_slurm_adopt your users can get a shell on the node of a specific job in that job's "mapped" space by running srun --pty --overlap --jobid JOBIDNUM bash -- Paul Raines (http://help.nmr.mgh.harvard.edu) On Mon, 14 Apr 2025 4:30am, Massimo Sgaravatto

[slurm-users] sllurmrestd via unix socket

2025-04-10 Thread Brian Andrus via slurm-users
8,   "mode": "backup"     } Other commands fail with:   "error_number": 1007,       "error": "Protocol authentication error", I'll admit, I don't usually use sockets, so I could easily be overlooking something there. Permissions on the socket look right. I am getting json back, so it is connecting. Note: slurmrestd is running under it's own user (not root and not slurmuser). Any ideas? Thanks in advance, Brian Andrus -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Minimum cpu cores per node partition level configuration

2025-04-10 Thread Cutts, Tim via slurm-users
Director, R&D IT - Data, Analytics & AI, Scientific Computing Platform AstraZeneca Find out more about R&D IT Data, Analytics & AI and how we can support you by visiting our Service Catalogue<https://azcollaboration.sharepoint.com/sites/CMU993> | From: Loris Bennett via slurm-

[slurm-users] weird sacct behavior?

2025-04-10 Thread Pierre Abele via slurm-users
and Evolution Deutscher Platz 6 04103 Leipzig Room: U2.80 E-Mail: pierre_ab...@eva.mpg.de Phone: +49 (0) 341 3550 245 smime.p7s Description: S/MIME Cryptographic Signature -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le

[slurm-users] Re: slurmctld HA ; backup controller doesn't schedule and start any job

2025-04-09 Thread Hiromasa Watanabe via slurm-users
know why, but this is OK. Thanks, Hiro -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] errors while trying to setup slurmdbd.

2025-04-09 Thread Steven Jones via slurm-users
ch cluster [root@vuwunicohpcdbp1 admjonesst1]# regards Steven -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: errors while trying to setup slurmdbd.

2025-04-09 Thread Steven Jones via slurm-users
Lik duh ty regards Steven Jones B.Eng (Hons) Technical Specialist - Linux RHCE Victoria University, Digital Solutions, Level 8 Rankin Brown Building, Wellington, NZ 6012 0064 4 463 6272 From: Christopher Samuel via slurm-users Sent: Thursday, 10

[slurm-users] Re: errors while trying to setup slurmdbd.

2025-04-09 Thread Christopher Samuel via slurm-users
Hi Steven, On 4/9/25 5:00 pm, Steven Jones via slurm-users wrote: Apr 10 10:28:52 vuwunicohpcdbp1.ods.vuw.ac.nz slurmdbd[2413]: slurmdbd: fatal: This host not configured to run SlurmDBD ((vuwunicohpcdbp1 or vuwunicohp> ^^^ that's the critical error message, and it's reporting

[slurm-users] pam error, related to accounting?

2025-04-09 Thread David Bremner via slurm-users
pam_access.so session required pam_unix.so session optional pam_systemd.so Accounting seems to be working OK. Does anyone know what PAM related code paths could be triggered by enabling accounting? d -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email

[slurm-users] recommended freeIPMI version

2025-04-09 Thread Heckes, Frank via slurm-users
.tar.gz). Does anyone knows whether this is the best choice? Many thanks in advance for any advice. Cheers, -Frank smime.p7s Description: S/MIME cryptographic signature -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le

[slurm-users] Re: recommended freeIPMI version

2025-04-09 Thread Ole Holm Nielsen via slurm-users
On 09-04-2025 18:23, Daniel Letai via slurm-users wrote: Although 1.6.15 is latest and greatest, there is already a patch https://lists.gnu.org/archive/html/freeipmi-devel/2025-02/msg0.html for an issue that was severe enough to fail to build on fedora42 https://bugzilla.redhat.com

[slurm-users] Re: recommended freeIPMI version

2025-04-09 Thread Heckes, Frank via slurm-users
Hello Ole, Hell Daniel, Many thanks for the quick reply and the information. That was what I was looking for. Thanks a lot. Cheers, -Frank From: Daniel Letai via slurm-users Sent: Wednesday, 9 April 2025 18:24 To: slurm-users@lists.schedmd.com Subject: [slurm-users] Re: recommended

[slurm-users] Re: recommended freeIPMI version

2025-04-09 Thread Daniel Letai via slurm-users
already in the new srpm in koji, so I would simply download that and see if can be built for rh https://koji.fedoraproject.org/koji/buildinfo?buildID=2674703 On 09/04/2025 18:43, Ole Holm Nielsen via slurm-users wrote: Hi Frank

[slurm-users] Re: recommended freeIPMI version

2025-04-09 Thread Ole Holm Nielsen via slurm-users
/Slurm_configuration/#ipmi-power-monitoring We have used the FreeIPMI plugin for a long time, and it works just great! We just upgraded to Slurm 24.11.4 today :-) On 4/9/25 17:19, Heckes, Frank via slurm-users wrote: I’d like to update to SLURM version 24.11.4. I was searching for a recommendation for freeIPMI

[slurm-users] slurmctld HA ; backup controller doesn't schedule and start any job

2025-04-09 Thread hiromasa.watanabe--- via slurm-users
urmUser=slurm StorageType=accounting_storage/mysql StorageHost=gateway1 StoragePass=mypassword StorageUser=slurm Best regards, Hiro -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Slurm version 24.11.4 is now available

2025-04-08 Thread Marshall Garey via slurm-users
e-cases of jobs incorrectly pending held when --prefer features are not initially satisfied. -- slurmctld - Fix jobs incorrectly held when --prefer not satisfied in some use-cases. -- Ensure RestrictedCoresPerGPU and CoreSpecCount don't overlap. -- slurm-users mailing list -- slurm-us

[slurm-users] Re: slurm send email status with no details

2025-04-06 Thread Ole Holm Nielsen via slurm-users
On 4/6/25 22:56, Oren via slurm-users wrote: Hi, We set up a slurm system with email notification, this is the slurm.conf `MailProg=/usr/sbin/sendmail` But the email that I get has not status, just an empty message: image.png no subject, no info, what are we missing? The image says that

[slurm-users] Re: Run a command in Slurm with all streams and signals connected to the submitting command

2025-04-06 Thread Michael Milton via slurm-users
variable that controls whether srun is inside the same job or not. Unsetting SLURM_CPU_BIND is needed to avoid "CPU binding outside of job step allocation". Cheers On Sat, Apr 5, 2025 at 3:39 PM Chris Samuel via slurm-users < slurm-users@lists.schedmd.com> wrote: > On 4/4/

[slurm-users] slurm send email status with no details

2025-04-06 Thread Oren via slurm-users
Hi, We set up a slurm system with email notification, this is the slurm.conf `MailProg=/usr/sbin/sendmail` But the email that I get has not status, just an empty message: [image: image.png] no subject, no info, what are we missing? Thanks~ -- slurm-users mailing list -- slurm-users

[slurm-users] Re: slurm releases

2025-04-05 Thread Ryan Novosielski via slurm-users
Computing - MSB A555B, Newark `' On Apr 1, 2025, at 12:41, Patrick Begou via slurm-users wrote: Hi slurm team, I would ask some clarifications with slurm releases. Why two versions of slurm are available ? I speak of 24.05.7 versus 24.11.3 on https://www.schedmd.com/slurm-support/re

[slurm-users] cpus and gpus partitions and how to optimize the resource usage

2025-04-05 Thread Massimo Sgaravatto via slurm-users
de) because it would mean having 3 partition (if I have got it right): two partitions for cpu only jobs and 1 partition for gpu jobs Many thanks, Massimo [*] https://groups.google.com/g/slurm-users/c/IUd7jLKME3M [**] https://groups.google.com/g/slurm-users/c/o7AiYAQ1YJ0 -- slurm-users mailing

[slurm-users] Re: Best Way to See GPUs in Use?

2025-04-05 Thread Paul Edmon via slurm-users
mp; Biases but that is code specific: https://wandb.ai/site/ You can also use scontrol -d show job to print out the layout of a job including which specific GPU's were assigned. -Paul Edmon- On 4/2/25 9:17 AM, Jason Simms via slurm-users wrote: Hello all, Apologies for the basic 

[slurm-users] slum job sumisison using different UID/GID

2025-04-05 Thread navin srivastava via slurm-users
for users it is not working as the UID/GID differ. Is there a way we can overcome this issue? and be able to run jobs. Rehards Navin -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Slurm 24.05 and OpenMPI

2025-04-04 Thread Matthias Leopold via slurm-users
ady give me a hint where to look. Thanks a lot Matthias -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Run a command in Slurm with all streams and signals connected to the submitting command

2025-04-04 Thread Chris Samuel via slurm-users
On 4/4/25 5:23 am, Michael Milton via slurm-users wrote: Plain srun re-uses the existing Slurm allocation, and specifying resources like --mem will just request then from the current job rather than submitting a new one srun does that as it sees all the various SLURM_* environment variables

[slurm-users] Re: Run a command in Slurm with all streams and signals connected to the submitting command

2025-04-04 Thread Michael Milton via slurm-users
> Looking at the srun man page, I could speculate that --clusters > or --cluster-constraint might help in that regard (but I am not sure). > > Have a nice weekend > > > On Fri, Apr 4, 2025 at 6:27 AM Michael Milton via slurm-users < > slurm-users@lists.schedmd.com> wrote: &g

[slurm-users] Re: Minimum cpu cores per node partition level configuration

2025-04-04 Thread Cutts, Tim via slurm-users
d/or time limits) -- Tim Cutts Senior Director, R&D IT - Data, Analytics & AI, Scientific Computing Platform AstraZeneca Find out more about R&D IT Data, Analytics & AI and how we can support you by visiting our Service Catalogue<https://azcollaboration.sharepoint.com/sites/CM

[slurm-users] Re: Preemption question

2025-04-04 Thread Kamil Wilczek via slurm-users
Hello David, thank you, this might be a simple and a viable solution to this problem. I'll test both (yours and Megan) solutions and then decide. Kind regards -- On Sun, Mar 30, 2025 at 08:19:12AM -0600, Davide DelVento via slurm-users wrote: Hi Kamil, I don't use QoS, so I do

[slurm-users] Re: cpus and gpus partitions and how to optimize the resource usage

2025-04-04 Thread Davide DelVento via slurm-users
there may be something I overlooked. On Mon, Mar 31, 2025 at 5:12 AM Massimo Sgaravatto via slurm-users < slurm-users@lists.schedmd.com> wrote: > Dear all > > > > We have just installed a small SLURM cluster composed of 12 nodes: > > - 6 CPU only nodes

[slurm-users] Run a command in Slurm with all streams and signals connected to the submitting command

2025-04-04 Thread Michael Milton via slurm-users
un re-uses the existing Slurm allocation, and specifying resources like --mem will just request then from the current job rather than submitting a new one What is the best solution here? -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Minimum cpu cores per node partition level configuration

2025-04-03 Thread Loris Bennett via slurm-users
Hi Tim, "Cutts, Tim via slurm-users" writes: > You can set a partition QoS which specifies a minimum. We have such a qos on > our large-gpu partition; we don’t want people scheduling small stuff to it, > so we > have this qos: How does this affect total throughput? P

[slurm-users] Re: Best Way to See GPUs in Use?

2025-04-02 Thread Ole Holm Nielsen via slurm-users
7;s prerequisites are listed in the README.md file in [2], namely the "gpustat" and "ClusterShell" tools. Best regards, Ole [1] https://github.com/OleHolmNielsen/Slurm_tools/tree/master/pestat [2] https://github.com/OleHolmNielsen/Slurm_tools/tree/master/jobs On 4/2/25 15:1

[slurm-users] Best Way to See GPUs in Use?

2025-04-02 Thread Jason Simms via slurm-users
d or what is now possible as a result. Warmest regards, Jason -- *Jason L. Simms, Ph.D., M.P.H.* Research Computing Manager Swarthmore College Information Technology Services (610) 328-8102 -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] slurm releases

2025-04-01 Thread Patrick Begou via slurm-users
rom bb-local (try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages) I think I will try to build 24.05.7 or 24.05.3 as a next try but I'm interested in any advices. Thank you Patrick -- slurm-users mailing

[slurm-users] Re: cpus and gpus partitions and how to optimize the resource usage

2025-04-01 Thread Davide DelVento via slurm-users
puNN and cpusingpuNN are physically the same node and whatever1 + >> whatever2 is the actual maximum amount of memory you want Slurm to >> allocate. And you will also want to make sure the Weight are such that the >> non-GPU nodes get used first. >> >> Disclaimer: I

[slurm-users] Re: cpus and gpus partitions and how to optimize the resource usage

2025-03-31 Thread Paul Edmon via slurm-users
could submit to both the cpu and the requeue partition (as slurm permits multipartition submissions) and then the gpu partition won't be blocked by anything and you can farm the space gpu cycles. This works well for our needs. -Paul Edmon- On 3/31/2025 9:39 AM, Paul Raines via slurm-u

[slurm-users] Re: cpus and gpus partitions and how to optimize the resource usage

2025-03-31 Thread Paul Raines via slurm-users
DelVento via slurm-users wrote: External Email - Use Caution Ciao Massimo, How about creating another queue cpus_in_the_gpu_nodes (or something less silly) which targets the GPU nodes but does not allow the allocation of the GPUs with gres and allocates 96-8 (or whatever other number you

[slurm-users] Re: cpus and gpus partitions and how to optimize the resource usage

2025-03-31 Thread Massimo Sgaravatto via slurm-users
e sure the Weight are such that the > non-GPU nodes get used first. > > Disclaimer: I'm thinking out loud, I have not tested this in practice, > there may be something I overlooked. > > > > > > > > > > > > > > > > > > On Mon, Mar

[slurm-users] Re: Preemption question

2025-03-31 Thread Kamil Wilczek via slurm-users
option to each dict and then updating all QoS individually, and yours solution certainly helps with the latter. Kind regards -- On Mon, Mar 31, 2025 at 12:08:49AM +, megan4slurm--- via slurm-users wrote: Hi Kamil, It is possible to set all QOS's "Preempt" value with two sacctmg

[slurm-users] Re: Preemption question

2025-03-30 Thread megan4slurm--- via slurm-users
ster set Preempt=+low > Modified qos... > normal > high > Would you like to commit changes? (You have 30 seconds to decide) > (N/y): y > $ sacctmgr show qos format=name,preempt,preemptmode > NamePreempt PreemptMode > -- -- --- >

[slurm-users] Preemption question

2025-03-30 Thread Kamil Wilczek via slurm-users
penpgp.org/] [D415917E84B8DA5A60E853B6E676ED061316B69B] -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

  1   2   3   4   5   6   7   8   9   10   >