Yes, I saw the same issue. Default for unset DefMemPerCPU changed from unlimited in earlier versions to 0. I just set it to 384 in slurm.conf so simple things run fine and make sure users always set a sane value on submission.
On Mon, Jun 11, 2018 at 6:40 PM, Roberts, John E. <jerobe...@anl.gov> wrote: > I see this in the debug logs: > "memory per node set to 1M in partition bdwall" > > I seemingly can alleviate this if I set RealMemory=foo in the Node > definitions, but this just seems like something that shouldn't be necessary. > Did this become a required field after 16.05?? > > Thanks! > John > > On 6/11/18, 4:12 PM, "Roberts, John E." <jerobe...@anl.gov> wrote: > > Nothing I assume isn't correct: > > DefMemPerNode = UNLIMITED > MaxMemPerNode = UNLIMITED > MemLimitEnforce = Yes > PropagateResourceLimitsExcept = MEMLOCK > > CPU vars aren't set and never were. > > Thanks! > John > > On 6/11/18, 4:09 PM, "slurm-users on behalf of Renfro, Michael" > <slurm-users-boun...@lists.schedmd.com on behalf of ren...@tntech.edu> wrote: > > Anything in particular set for DefMemPerCPU in your slurm.conf? > > > On Jun 11, 2018, at 3:50 PM, Roberts, John E. <jerobe...@anl.gov> > wrote: > > > > Hi, > > > > Seeing this after an upgrade today. I now can't get any jobs to > run. Things were fin before the upgrade. Any Ideas? > > > > slurmstepd: error: Job 535721 exceeded memory limit (1160 > > 1024), being killed > > slurmstepd: error: Exceeded job memory limit > > > > > >