After assistance from an AWS colleague, GrpTRESMins seems to be working.

Hoot

> On Apr 21, 2023, at 4:43 AM, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> 
> wrote:
> 
> Hi Jason,
> 
> On 4/20/23 20:11, Jason Simms wrote:
>> Hello Ole and Hoot,
>> First, Hoot, thank you for your question. I've managed Slurm for a few years 
>> now and still feel like I don't have a great understanding about managing or 
>> limiting resources.
>> Ole, thanks for your continued support of the user community with your 
>> documentation. I do wish not only that more of your information were 
>> contained within the official docs, but also that there were even clearer 
>> discussions around certain topics.
>> As an example, you write that "It is important to configure slurm.conf so 
>> that the locked memory limit isn’t propagated to the batch jobs" by setting 
>> PropagateResourceLimitsExcept=MEMLOCK. It's unclear to me whether you are 
>> suggesting that literally everyone should have that set, or whether it only 
>> applies to certain configurations. We don't have it set, for instance, but 
>> we've not run into trouble with jobs failing due to locked memory errors.
> 
> The link mentioned in the page hopefully explains it: 
> https://slurm.schedmd.com/faq.html#memlock
> 
>> Then, in the official docs, to which you link, it says that "it may also be 
>> desirable to lock the slurmd daemon's memory to help ensure that it keeps 
>> responding if memory swapping begins" by creating /etc/sysconfig/slurm 
>> containing the line SLURMD_OPTIONS="-M". Would there ever be a reason *not* 
>> to include that? That is, I can't think it would ever be desirable for 
>> slurmd to stop responding. So is that another "universal" recommendation, I 
>> wonder?
> 
> I'm not an expert on locking slurmd pages!  The -M option is documented in 
> the slurmd manual page, and I probably read a thread long ago abut this on 
> the slurm-users mailing list discussing this.  You could try it out in your 
> environment and see if all is well.
> 
>> It may be me talking as a new-ish user, but I would find a concise document 
>> laying out common or useful configuration options to be presented when 
>> setting up or reconfiguring Slurm. I'm certain I have inefficient or missing 
>> options that I should have.
> 
> IMHO, most sites have their own requirements and preferences, so I don't 
> think there is a one-size-fits-all Slurm installation solution.
> 
> Since requirements can be so different, and because Slurm is a fantastic 
> software that can be configured for many different scenarios, IMHO a support 
> contract with SchedMD is the best way to get consulting services, get general 
> help, and report bugs.  We have excellent experiences with SchedMD support 
> (https://www.schedmd.com/support.php).
> 
> Best regards,
> Ole
> 
>> On Thu, Apr 20, 2023 at 2:11 AM Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk 
>> <mailto:ole.h.niel...@fysik.dtu.dk>> wrote:
>>    Hi Hoot,
>>    On 4/20/23 00:15, Hoot Thompson wrote:
>>     > Is there a ‘how to’ or recipe document for setting up and enforcing
>>    resource limits? I can establish accounts, users, and set limits but
>>    'current value' is not incrementing after running jobs.
>>    I have written about resource limits in this Wiki page:
>>    
>> https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#partition-limits
>>  
>> <https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#partition-limits>
> 


Reply via email to