Thanks folks to all who responded! setting SelectTypeParameters = CR_CPU_Memory did the trick.
On Fri, Jun 23, 2023 at 3:21 AM Shunran Zhang < szh...@ngs.gen-info.osaka-u.ac.jp> wrote: > Hi > > Would you mind to check your job scheduling settings in slurm.conf ? > > Namely *SelectTypeParameters = **CR_CPU_Memory *or the like. > > Also, you may want to use systemd-cgtop to at least confirm jobs are > indeed running in cgroups. > > Sincerely, > S. Zhang > > On Fri, Jun 23, 2023, 12:07 Boris Yazlovitsky <boris...@gmail.com> wrote: > >> it's still not constraining memory... >> >> a memhog job continues to memhog: >> >> boris@rod:~/scripts$ sacct --starttime=2023-05-01 >> --format=jobid,user,start,elapsed,reqmem,maxrss,maxvmsize,nodelist,state,exit >> -j 199 >> JobID User Start Elapsed ReqMem >> MaxRSS MaxVMSize NodeList State ExitCode >> ------------ --------- ------------------- ---------- ---------- >> ---------- ---------- --------------- ---------- -------- >> 199 boris 2023-06-23T02:42:30 00:01:21 1M >> milhouse COMPLETED 0:0 >> 199.batch 2023-06-23T02:42:30 00:01:21 >> 104857988K 104858064K milhouse COMPLETED 0:0 >> >> One thing I noticed is that the machines I'm working on do not have >> libcgroup and libcgroup-dev installed - but slurm does have its own cgroup >> implementation? the slurmd processes do utilize /usr/lib/slurm/*cgroup.so >> objects. I will try to recompile slurm with those cgrouplib packages >> present. >> >> On Thu, Jun 22, 2023 at 6:04 PM Ozeryan, Vladimir < >> vladimir.ozer...@jhuapl.edu> wrote: >> >>> No worries, >>> >>> No, we don’t have any OS level settings, only “allowed_devices.conf” >>> which just has /dev/random, /dev/tty and stuff like that. >>> >>> >>> >>> But I think this could be the culprit, check out man page for cgroup.conf >>> AllowedRAMSpace=100 >>> >>> >>> >>> I would just leave these four: >>> >>> CgroupAutomount=yes >>> ConstrainCores=yes >>> ConstrainDevices=yes >>> ConstrainRAMSpace=yes >>> >>> >>> >>> Vlad. >>> >>> >>> >>> *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> *On Behalf >>> Of *Boris Yazlovitsky >>> *Sent:* Thursday, June 22, 2023 5:40 PM >>> *To:* Slurm User Community List <slurm-users@lists.schedmd.com> >>> *Subject:* Re: [slurm-users] [EXT] --mem is not limiting the job's >>> memory >>> >>> >>> >>> *APL external email warning: *Verify sender >>> slurm-users-boun...@lists.schedmd.com before clicking links or >>> attachments >>> >>> >>> >>> thank you Vlad - looks like we have the same yes's >>> >>> Do you remember if you had to make any settings on the OS level or in >>> the kernel to make it work? >>> >>> >>> >>> -b >>> >>> >>> >>> On Thu, Jun 22, 2023 at 5:31 PM Ozeryan, Vladimir < >>> vladimir.ozer...@jhuapl.edu> wrote: >>> >>> Hello, >>> >>> >>> >>> We have the following configured and it seems to be working ok. >>> >>> >>> >>> CgroupAutomount=yes >>> ConstrainCores=yes >>> ConstrainDevices=yes >>> ConstrainRAMSpace=yes >>> >>> Vlad. >>> >>> >>> >>> *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> *On Behalf >>> Of *Boris Yazlovitsky >>> *Sent:* Thursday, June 22, 2023 4:50 PM >>> *To:* Slurm User Community List <slurm-users@lists.schedmd.com> >>> *Subject:* Re: [slurm-users] [EXT] --mem is not limiting the job's >>> memory >>> >>> >>> >>> *APL external email warning: *Verify sender >>> slurm-users-boun...@lists.schedmd.com before clicking links or >>> attachments >>> >>> >>> >>> Hello Vladimir, thank you for your response. >>> >>> >>> >>> this is the cgroups.conf file: >>> >>> CgroupAutomount=yes >>> ConstrainCores=yes >>> ConstrainDevices=yes >>> ConstrainRAMSpace=yes >>> ConstrainSwapSpace=yes >>> MaxRAMPercent=90 >>> AllowedSwapSpace=0 >>> AllowedRAMSpace=100 >>> MemorySwappiness=0 >>> MaxSwapPercent=0 >>> >>> >>> >>> /etc/default/grub: >>> >>> GRUB_DEFAULT=0 >>> GRUB_TIMEOUT_STYLE=hidden >>> GRUB_TIMEOUT=0 >>> GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian` >>> GRUB_CMDLINE_LINUX_DEFAULT="" >>> GRUB_CMDLINE_LINUX="net.ifnames=0 biosdevname=0 cgroup_enable=memory >>> swapaccount=1" >>> >>> >>> >>> what other cgroup settings need to be set? >>> >>> >>> >>> && thank you! >>> >>> -b >>> >>> >>> >>> On Thu, Jun 22, 2023 at 4:02 PM Ozeryan, Vladimir < >>> vladimir.ozer...@jhuapl.edu> wrote: >>> >>> --mem=5G. Should allocate 5G of memory per node. >>> >>> Are your cgroups configured? >>> >>> >>> >>> *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> *On Behalf >>> Of *Boris Yazlovitsky >>> *Sent:* Thursday, June 22, 2023 3:28 PM >>> *To:* slurm-users@lists.schedmd.com >>> *Subject:* [EXT] [slurm-users] --mem is not limiting the job's memory >>> >>> >>> >>> *APL external email warning: *Verify sender >>> slurm-users-boun...@lists.schedmd.com before clicking links or >>> attachments >>> >>> >>> >>> Running slurm 22.03.02 on Ubunutu 22.04 server. >>> >>> Jobs submitted with --mem=5g are able to allocate an unlimited amount of >>> memory. >>> >>> >>> >>> how to limit on the job submission level how much memory it can grab? >>> >>> >>> >>> thanks, and best regards! >>> Boris >>> >>> >>> >>>