Hi Jacob, I generally think, that that is the better way. If you have e.g. tasks with different memory needs, Slurm (or the oom_killer to be precise) would kill the job, if that limit gets exceeded. If the limit is set for the step, the tasks can "steal" memory from each other.
Best Marcus Am 22.06.2021 um 18:46 schrieb Jacob Chappell:
Hello everyone, I came across a weird behavior and was wondering if this is a bug, oversight, or intended? It appears that Slurm does not set memory.limit_in_bytes at the task level, but it does set it at the step level and above. Observe: $ grep memory /proc/$$/cgroup 10:memory:/slurm/uid_2001/job_304876/step_0/task_0 $ cd /sys/fs/cgroup/memory/slurm/uid_2001/job_304876/step_0/task_0 $ cat memory.limit_in_bytes 9223372036854771712 <--- basically unlimited But lets check the parent: $ cat ../memory.limit_in_bytes 33554432000 <-- set properly to around 32 GB, see below $ scontrol show job 304876 | grep mem= TRES=cpu=8,mem=*32000M*,node=1,billing=8 Now, it does appear that the task is still limited to the step's memory limit given the hierarchical nature of cgroups, but I just wanted to mention this anyway and see if anyone had any thoughts. Thanks, __________________________________________________ *Jacob D. Chappell, CSM* Research Computing | Research Computing Infrastructure Information Technology Services | University of Kentucky jacob.chapp...@uky.edu <mailto:jacob.chapp...@uky.edu>
-- Dipl.-Inf. Marcus Wagner IT Center Gruppe: Systemgruppe Linux Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag...@itc.rwth-aachen.de www.itc.rwth-aachen.de Social Media Kanäle des IT Centers: https://blog.rwth-aachen.de/itc/ https://www.facebook.com/itcenterrwth https://www.linkedin.com/company/itcenterrwth https://twitter.com/ITCenterRWTH https://www.youtube.com/channel/UCKKDJJukeRwO0LP-ac8x8rQ
smime.p7s
Description: S/MIME Cryptographic Signature