Are you using pam_limits.so in any of your /etc/pam.d/ configuration files? 
That would be enforcing /etc/security/limits.conf for all users which are 
usually unlimited for root. Root’s almost always allowed to do stuff bad enough 
to crash the machine or run it out of resources. If the /etc/pam.d/sshd file 
has pam_limits.so in it, that’s probably where the unlimited setting for root 
is coming from.

Best,
Bill.

-- 
Bill Barth, Ph.D., Director, HPC
bba...@tacc.utexas.edu        |   Phone: (512) 232-7069
Office: ROC 1.435            |   Fax:   (512) 475-9445
 
 

On 4/15/18, 1:26 PM, "slurm-users on behalf of Mahmood Naderan" 
<slurm-users-boun...@lists.schedmd.com on behalf of mahmood...@gmail.com> wrote:

    I actually have disabled the swap partition (!) since the system goes
    really bad and based on my experience I have to enter the room and
    reset the affected machine (!). Otherwise I have to wait for long
    times to see it get back to normal.
    
    When I ssh to the node with root user, the ulimit -a says unlimited
    virtual memory. So, it seems that the root have unlimited value while
    users have limited value.
    
    Regards,
    Mahmood
    
    
    
    
    On Sun, Apr 15, 2018 at 10:26 PM, Ole Holm Nielsen
    <ole.h.niel...@fysik.dtu.dk> wrote:
    > Hi Mahmood,
    >
    > It seems your compute node is configured with this limit:
    >
    > virtual memory          (kbytes, -v) 72089600
    >
    > So when the batch job tries to set a higher limit (ulimit -v 82089600) 
than
    > permitted by the system (72089600), this must surely get rejected, as you
    > have discovered!
    >
    > You may want to reconfigure your compute nodes' limits, for example by
    > setting the virtual memory limit to "unlimited" in your configuration. If
    > the nodes has a very small RAM memory + swap space size, you might 
encounter
    > Out Of Memory errors...
    >
    > /Ole
    
    

Reply via email to