[slurm-users] Re: Increasing SlurmdTimeout beyond 300 Seconds

2024-02-12 Thread Timony, Mick via slurm-users
We set SlurmdTimeout=600​. The docs say not to go any higher than 65533 seconds:

https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdTimeout

The FAQ has info about SlurmdTimeout also. The worst thing that could happen is 
will take longer to set nodes as being down:
>A node is set DOWN when the slurmd daemon on it stops responding for 
>SlurmdTimeout as defined in slurm.conf.

https://slurm.schedmd.com/faq.html

I wouldn't set it too high, but too high vs too low will vary from site to site 
and how busy your controllers are and how busy your network is.

​Regards
--Mick

From: Bjørn-Helge Mevik via slurm-users 
Sent: Monday, February 12, 2024 7:16 AM
To: slurm-us...@schedmd.com 
Subject: [slurm-users] Re: Increasing SlurmdTimeout beyond 300 Seconds

We've been running one cluster with SlurmdTimeout = 1200 sec for a
couple of years now, and I haven't seen any problems due to that.

--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Job submitted to multiple partitions not running when any partition is full

2024-07-09 Thread Timony, Mick via slurm-users
Hi Paul,

There could be multiple reasons why the job isn't running, from the user's QOS 
to your cluster hitting MaxJobCount. This page might help:

https://slurm.schedmd.com/high_throughput.html

The output of the following command might help:

scontrol show job 465072​

Regards
--
Mick Timony
Senior DevOps Engineer
Harvard Medical School
--


From: Paul Raines via slurm-users 
Sent: Tuesday, July 9, 2024 9:24 AM
To: slurm-users 
Subject: [slurm-users] Job submitted to multiple partitions not running when 
any partition is full


I have a job 465072 submitted to multiple partitions (rtx6000,rtx8000,pubgpu)

   JOBID PARTITION  PENDING PRIORITY   TRES_ALLOC|REASON
4650727 rtx6000  47970 0.00367972 cpu=5,mem=400G,node=1,gpu=1|Priority
4650727 rtx8000  47970 0.00367972 cpu=5,mem=400G,node=1,gpu=1|Priority
4650727 pubgpu   47970 0.00367972 cpu=5,mem=400G,node=1,gpu=1|Priority
4646926 rtx6000 487048 0.00121987 
cpu=10,mem=32G,node=1,gpu=1|Priority,Resources
4650186 rtx8000  56979 0. 
cpu=4,mem=10G,node=1,gpu=1|Priority,Resources

We see the two partitions rtx6000 and rtx8000 are full and two other
jobs are at the top of the queue waiting to run on those.  But partition
pubgpu is NOT full and you can see here a node leo with resources to
run the 4650727 job

HOST   PARTITION CORES   MEMORY   GPUS
leopubgpu   48/ 6412288/1030994   0/ 1
leopubcpu   48/ 6412288/1030994   0/ 1

The node leo is NOT part of the rtx6000 or rtx8000 partitions and
there are no other pending jobs waiting on either the pubgpu or
pubcpu partition that leo is part of

So why is 4650727 not running on the pubgpu partition?

---
Paul Raines http://help.nmr.mgh.harvard.edu
MGH/MIT/HMS Athinoula A. Martinos Center for Biomedical Imaging
149 (2301) 13th Street Charlestown, MA 02129USA



The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient information, please contact the Mass General Brigham 
Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline 
 .
Please note that this e-mail is not secure (encrypted).  If you do not wish to 
continue communication over unencrypted e-mail, please notify the sender of 
this message immediately.  Continuing to send or respond to e-mail after 
receiving this message means you understand and accept this risk and wish to 
continue to communicate over unencrypted e-mail.


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Temporarily bypassing pam_slurm_adopt.so

2024-07-09 Thread Timony, Mick via slurm-users
At HMS we do the same as Paul's cluster and specify the groups we want to have 
access to all our compute nodes, we allow two groups that represent our DevOps 
team and our Research Computing consultants  to have access and then 
corresponding sudo rules for each group to allow different command sets to be 
run.

The Slurm docs mentions how /etc/security/access.conf​​ could be configured at:

https://slurm.schedmd.com/pam_slurm_adopt.html#admin_access

Here's an example of how /etc/security/access.conf​ could be configured:


+ :sysadmin_group:ALL
+ :researchcomputing_group:ALL
# All other users should be denied to get access from all sources.
- :ALL:ALL

Kind regards
Mick

--


From: Paul Edmon via slurm-users 
Sent: Tuesday, July 9, 2024 9:34 AM
To: slurm-users@lists.schedmd.com 
Subject: [slurm-users] Re: Temporarily bypassing pam_slurm_adopt.so

We do this by adding groups/users to /etc/security/access.conf That
should grant normal ssh access assuming you still have pam_access.so
still in your sshd config.  Note that if the user has a job on the node,
slurm will still shunt them into that job even with the access.conf
setting.  So when the job ends the user's session will also end. However
if the user has no job on that node, then they can ssh as normal to that
host with out any problem.

-Paul Edmon-

On 7/8/2024 5:48 PM, Chris Taylor via slurm-users wrote:
> On my Rocky9 cluster I got this to work fine also-
>
> Added at the end of /etc/pam.d/sshd:
>
> accountsufficientpam_listfile.so item=user sense=allow onerr=fail 
> file=/etc/slurm/allowed_users_file
> accountrequired  pam_slurm_adopt.so
>
> I added a couple of usernames to /etc/slurm/allowed_users_file and they can 
> SSH to the node without a job or allocation there.
>
> Chris
>
>> On 07/08/2024 2:07 PM PDT David Schanzenbach via slurm-users 
>>  wrote:
>>
>>
>> Hi Daniel,
>>
>>   Utilizing pam_access with pam_slurm_adopt might be what you are looking 
>> for?
>>   https://slurm.schedmd.com/pam_slurm_adopt.html#admin_access
>>
>>   Thanks,
>>   David
>>
>>

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Do I have to hold back RAM for worker nodes?

2025-05-12 Thread Timony, Mick via slurm-users
We do something very similar at HMS.  For instance our nodes with 257468MB of 
RAM we round down RealMemory to 257000MB, for nodes with 1031057MB of RAM we 
round down to 100 etc.

We may tune this on our next OS and Slurm update as I expect to see more memory 
used by the OS as we migrating to RHEL9.

Cheers

--
Mick Timony
Senior DevOps Engineer
LASER, Longwood, & O2 Cluster Admin
Harvard Medical School
--

From: Paul Edmon via slurm-users 
Sent: Monday, May 12, 2025 10:14 AM
To: slurm-users@lists.schedmd.com 
Subject: [slurm-users] Re: Do I have to hold back RAM for worker nodes?


The way we typically do it here is that we look at the idle memory usage of the 
system by the OS and then reserve the nearest power of 2 for that. For instance 
right now we have 16 GB set for our MemSpecLimit. That may seem like a lot but 
our nodes typically have 1 TB of memory so 16 GB is not that much. The newer 
hardware tends to eat up more base memory, at least from my experience.

-Paul Edmon-

On 5/12/25 8:55 AM, Xaver Stiensmeier via slurm-users wrote:

Josh,

thank you for your thorough answer. I, too, considered switching to 
CR_Core_Memory after reading into this. Thank you for confirming my suspicion 
that without Memory, we cannot handle high memory requests adequately.

If I may ask: How do you come up with the specific MemSpecLimit? Do you 
handpick a value for each node, have you picked a constant value for all nodes 
or do you take a capped percentage of the maximum memory available?

Best regards,
Xaver

On 5/12/25 14:43, Joshua Randall wrote:
Xaver,

It is my understanding that if we want to have stable systems that don't run 
out of memory, we do need to manage the amount of memory needed for everything 
not running within a slurm job, yes.

In our cluster, we are using `CR_Core_Memory` (so we do constrain our job 
memory) and we set the `RealMemory` to the actual full amount of memory 
available on the machine - I believe these really are given in megabytes (MB), 
not mebibytes (MiB). I think their example of (e.g. "2048") is intended to 
convey this because 2000 MiB is 2048 MB. We set the `MemSpecLimit` for each 
node to set memory aside for everything in the system that is not running 
within a slurm job. This include the slurm daemon itself, the kernel, 
filesystem drivers, metrics collection agents, etc -- anything else we are 
running outside the control of slurm jobs. The `MemSpecLimit` just sets aside 
the specified amount and the result will be that the maximum memory jobs can 
use on the node is (RealMemory - MemSpecLimit). When using cgroups to limit 
memory, slurmd will also be allocated the specified limit so that the slurm 
daemon cannot encroach on job memory. However, note that `MemSpecLimit` is 
documented to not work unless your `SelectTypeParameters` includes Memory as a 
consumable resource.

Since you are using `CR_Core` (which does not configure Memory as a consumable 
resource) then I believe your system will not be constraining job memory at 
all. Jobs can oversubscribe memory as many times over as there are cores, and 
any job would be able to run the machine out of memory by using more than is 
available. With this setting, I guess you could say you don't have to manage 
reserving memory for the OS and slurmd, but only in the sense that any job 
could consume all the memory and cause the system OOM killer to kill a random 
process (including slurmd or something else system critical).

Cheers,

Josh.


--
Dr. Joshua C. Randall
Director of Software Engineering, HPC
Altos Labs
email: jrand...@altoslabs.com



On Mon, May 12, 2025 at 10:27 AM Xaver Stiensmeier via slurm-users 
mailto:slurm-users@lists.schedmd.com>> wrote:

Dear Slurm-User List,

currently, in our slurm.conf, we are setting:

SelectType: select/cons_tres
SelectTypeParameters: CR_Core

and in our node configuration RealMemory was basically reduced by an amount to 
make sure the node always had enough RAM to run the OS. However, this is 
apparently now how it is supposed to be done:

Lowering RealMemory with the goal of setting aside some amount for the OS and 
not available for job allocations will not work as intended if Memory is not 
set as a consumable resource in SelectTypeParameters. So one of the *_Memory 
options need to be enabled for that goal to be accomplished. 
(https://slurm.schedmd.com/slurm.conf.html#OPT_RealMemory)

This leads to four questions regarding holding back RAM for worker nodes. 
Answers/help with any of those questions would be appreciated.

1. Is reserving enough RAM for the worker node's OS and slurmd actually a thing 
you h