After assistance from an AWS colleague, GrpTRESMins seems to be working. Hoot
> On Apr 21, 2023, at 4:43 AM, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> > wrote: > > Hi Jason, > > On 4/20/23 20:11, Jason Simms wrote: >> Hello Ole and Hoot, >> First, Hoot, thank you for your question. I've managed Slurm for a few years >> now and still feel like I don't have a great understanding about managing or >> limiting resources. >> Ole, thanks for your continued support of the user community with your >> documentation. I do wish not only that more of your information were >> contained within the official docs, but also that there were even clearer >> discussions around certain topics. >> As an example, you write that "It is important to configure slurm.conf so >> that the locked memory limit isn’t propagated to the batch jobs" by setting >> PropagateResourceLimitsExcept=MEMLOCK. It's unclear to me whether you are >> suggesting that literally everyone should have that set, or whether it only >> applies to certain configurations. We don't have it set, for instance, but >> we've not run into trouble with jobs failing due to locked memory errors. > > The link mentioned in the page hopefully explains it: > https://slurm.schedmd.com/faq.html#memlock > >> Then, in the official docs, to which you link, it says that "it may also be >> desirable to lock the slurmd daemon's memory to help ensure that it keeps >> responding if memory swapping begins" by creating /etc/sysconfig/slurm >> containing the line SLURMD_OPTIONS="-M". Would there ever be a reason *not* >> to include that? That is, I can't think it would ever be desirable for >> slurmd to stop responding. So is that another "universal" recommendation, I >> wonder? > > I'm not an expert on locking slurmd pages! The -M option is documented in > the slurmd manual page, and I probably read a thread long ago abut this on > the slurm-users mailing list discussing this. You could try it out in your > environment and see if all is well. > >> It may be me talking as a new-ish user, but I would find a concise document >> laying out common or useful configuration options to be presented when >> setting up or reconfiguring Slurm. I'm certain I have inefficient or missing >> options that I should have. > > IMHO, most sites have their own requirements and preferences, so I don't > think there is a one-size-fits-all Slurm installation solution. > > Since requirements can be so different, and because Slurm is a fantastic > software that can be configured for many different scenarios, IMHO a support > contract with SchedMD is the best way to get consulting services, get general > help, and report bugs. We have excellent experiences with SchedMD support > (https://www.schedmd.com/support.php). > > Best regards, > Ole > >> On Thu, Apr 20, 2023 at 2:11 AM Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk >> <mailto:ole.h.niel...@fysik.dtu.dk>> wrote: >> Hi Hoot, >> On 4/20/23 00:15, Hoot Thompson wrote: >> > Is there a ‘how to’ or recipe document for setting up and enforcing >> resource limits? I can establish accounts, users, and set limits but >> 'current value' is not incrementing after running jobs. >> I have written about resource limits in this Wiki page: >> >> https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#partition-limits >> >> <https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#partition-limits> >