Hi Marcus, That makes sense, thanks! I suppose then (for monitoring purposes, for example, without probing scontrol/sacct) if you wanted to figure out the true maximum memory limit for a task, you'd need to walk up the hierarchy and take whatever the smallest value you find is.
__________________________________________________ *Jacob D. Chappell, CSM* Research Computing | Research Computing Infrastructure Information Technology Services | University of Kentucky jacob.chapp...@uky.edu On Wed, Jun 23, 2021 at 6:32 AM Marcus Wagner <wag...@itc.rwth-aachen.de> wrote: > CAUTION: External Sender > > > > > ---------- Forwarded message ---------- > From: Marcus Wagner <wag...@itc.rwth-aachen.de> > To: <slurm-users@lists.schedmd.com> > Cc: > Bcc: > Date: Wed, 23 Jun 2021 13:30:14 +0200 > Subject: Re: [slurm-users] Slurm does not set memory.limit_in_bytes for > tasks (but does for steps) > Hi Jacob, > > I generally think, that that is the better way. > If you have e.g. tasks with different memory needs, Slurm (or the > oom_killer to be precise) would kill the job, if that limit gets exceeded. > If the limit is set for the step, the tasks can "steal" memory from each > other. > > > Best > Marcus > > Am 22.06.2021 um 18:46 schrieb Jacob Chappell: > > Hello everyone, > > > > I came across a weird behavior and was wondering if this is a bug, > oversight, or intended? > > > > It appears that Slurm does not set memory.limit_in_bytes at the task > level, but it does set it at the step level and above. Observe: > > > > $ grep memory /proc/$$/cgroup > > 10:memory:/slurm/uid_2001/job_304876/step_0/task_0 > > > > $ cd /sys/fs/cgroup/memory/slurm/uid_2001/job_304876/step_0/task_0 > > > > $ cat memory.limit_in_bytes > > 9223372036854771712 <--- basically unlimited > > > > But lets check the parent: > > > > $ cat ../memory.limit_in_bytes > > 33554432000 <-- set properly to around 32 GB, see below > > > > $ scontrol show job 304876 | grep mem= > > TRES=cpu=8,mem=*32000M*,node=1,billing=8 > > > > Now, it does appear that the task is still limited to the step's memory > limit given the hierarchical nature of cgroups, but I just wanted to > mention this anyway and see if anyone had any thoughts. > > > > Thanks, > > __________________________________________________ > > *Jacob D. Chappell, CSM* > > Research Computing | Research Computing Infrastructure > > Information Technology Services | University of Kentucky > > jacob.chapp...@uky.edu <mailto:jacob.chapp...@uky.edu> > > -- > Dipl.-Inf. Marcus Wagner > > IT Center > Gruppe: Systemgruppe Linux > Abteilung: Systeme und Betrieb > RWTH Aachen University > Seffenter Weg 23 > 52074 Aachen > Tel: +49 241 80-24383 > Fax: +49 241 80-624383 > wag...@itc.rwth-aachen.de > www.itc.rwth-aachen.de > > Social Media Kanäle des IT Centers: > https://blog.rwth-aachen.de/itc/ > https://www.facebook.com/itcenterrwth > https://www.linkedin.com/company/itcenterrwth > https://twitter.com/ITCenterRWTH > https://www.youtube.com/channel/UCKKDJJukeRwO0LP-ac8x8rQ > >