Ahmet, Thank you for taking the time to respond to my question.
Yes, the --mem=1GBB is a typo. It's correct in my script, I just fat-fingered it in the email. :-) BTW, the exact version I am using is 19.05.*2.* Regarding your response, it seems that that might be more than what I need. I simply want to enforce the memory limits as specified by the user at job submission time. This seems to have been the behavior in previous versions of Slurm. What I want is what is described in the 19.05 release notes: *RELEASE NOTES FOR SLURM VERSION 19.0528 May 2019* *NOTE: slurmd and slurmctld will now fatal if two incompatible mechanisms for enforcing memory limits are set. This makes incompatible the use of task/cgroup memory limit enforcing (Constrain[RAM|Swap]Space=yes) with JobAcctGatherParams=OverMemoryKill, which could cause problems when a task is killed by one of them while the other is at the same time managing that task. The NoOverMemoryKill setting has been deprecated in favor of OverMemoryKill, since now the default is *NOT* to have any memory enforcement mechanism.NOTE: MemLimitEnforce parameter has been removed and the functionality that was provided with it has been merged into a JobAcctGatherParams. It may be enabled by setting JobAcctGatherParams=OverMemoryKill, so now job and steps killing by OOM is enabled from the same place.* So, is it really necessary to do what you suggested to get that functionality? If someone could post just a simple slurm.conf file that forces the memory limits to be honored (and kills the job if they are exceeded), then I could extract what I need from that. Again, thanks for the assistance. Mike On Thu, Oct 24, 2019 at 11:27 PM mercan <ahmet.mer...@uhem.itu.edu.tr> wrote: > Hi; > > You should set > > SelectType=select/cons_res > > and plus one of these: > > SelectTypeParameters=CR_Memory > SelectTypeParameters=CR_Core_Memory > SelectTypeParameters=CR_CPU_Memory > SelectTypeParameters=CR_Socket_Memory > > to open Memory allocation tracking according to documentation: > > https://slurm.schedmd.com/cons_res_share.html > > Also, the line: > > #SBATCH --mem=1GBB > > contains "1GBB". Is this same at job script? > > > Regards; > > Ahmet M. > > > 24.10.2019 23:00 tarihinde Mike Mosley yazdı: > > Hello, > > > > We are testing Slurm19.05 on Linux RHEL7.5+ with the intent to migrate > > from it toTorque/Moab in the near future. > > > > One of the things our users are used to is that when their jobs exceed > > the amount of memory they requested, the job is terminated by the > > scheduler. We realize the Slurm prefers to use cgroups to contain > > rather than kill the jobs but initially we need to have the kill > > option in place to transition our users. > > > > So, looking at the documentation, it appears that in 19.05, the > > following needs to be set to accomplish this: > > > > JobAcctGatherParams = OverMemoryKill > > > > > > Other possibly relevant settings we made: > > > > JobAcctGatherType = jobacct_gather/linux > > > > ProctrackType = proctrack/linuxproc > > > > > > We have avoided configuring any cgroup parameters for the time being. > > > > Unfortunately, when we submit a job with the following: > > > > #SBATCH --nodes=1 > > > > #SBATCH --ntasks-per-node=1 > > > > #SBATCH --mem=1GBB > > > > > > We see RSS ofthe job steadily increase beyond the 1GB limit and it is > > never killed. Interestingly enough, the proc information shows the > > ulimit (hard and soft) for the process set to around 1GB. > > > > We have tried various settings without any success. Can anyone point > > out what we are doing wrong? > > > > Thanks, > > > > Mike > > > > -- > > */J. Michael Mosley/* > > University Research Computing > > The University of North Carolina at Charlotte > > 9201 University City Blvd > > Charlotte, NC 28223 > > _704.687.7065 _ _ j/mmos...@uncc.edu <mailto:mmos...@uncc.edu>/_ > -- *J. Michael Mosley* University Research Computing The University of North Carolina at Charlotte 9201 University City Blvd Charlotte, NC 28223 *704.687.7065 * * jmmos...@uncc.edu <mmos...@uncc.edu>*
smime.p7s
Description: S/MIME Cryptographic Signature