Re: [slurm-users] What's the best way to suppress core dump files from jobs?

Bill Barth Wed, 21 Mar 2018 05:09:49 -0700

You could set /etc/security/limits.conf on every node to contain something like 
(check my syntax):


* soft core 0
* hard  core 0

And make sure that /etc/pam.d/slurm.* and /etc/pam.d/system-auth* contain:

session     required      pam_limits.so
session     required      pam_limits.so

…so that limits are enforced for each user session. We have these lines in 
several other PAM files, but those above might be the minimum set for use with 
SLURM and SSH. Both sets of files might not be necessary, but if you allow ssh 
to compute nodes after a job is started, you probably need both.

Best,
Bill.


-- 
Bill Barth, Ph.D., Director, HPC
bba...@tacc.utexas.edu        |   Phone: (512) 232-7069
Office: ROC 1.435            |   Fax:   (512) 475-9445
 
 

On 3/21/18, 6:08 AM, "slurm-users on behalf of Ole Holm Nielsen" 
<slurm-users-boun...@lists.schedmd.com on behalf of ole.h.niel...@fysik.dtu.dk> 
wrote:

    We experience problems with MPI jobs dumping lots (1 per MPI task) of 
    multi-GB core dump files, causing problems for file servers and compute 
    nodes.
    
    The user has "ulimit -c 0" in his .bashrc file, but that's ignored when 
    slurmd starts the job, and the slurmd process limits are employed in stead.
    
    I should mention that we have decided to configure slurm.conf with
       PropagateResourceLimitsExcept=ALL
    because it's desirable to have rather restrictive user limits on login 
    nodes.  Unfortunately, this means that the user's "ulimit -c 0" isn't 
    propagated to any batch job.
    
    What's the best way to suppress core dump files from jobs?  Does anyone 
    have good or bad experiences?
    
    One working solution is to modify the slurmd Systemd service file 
    /usr/lib/systemd/system/slurmd.service to add a line:
       LimitCORE=0
    I've documented further details in my Slurm Wiki page 
    
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#slurmd-systemd-limits. 
      However, it's a bit cumbersome to modify the Systemd service file on 
    all compute nodes.
    
    Thanks for sharing any experiences.
    
    /Ole

Re: [slurm-users] What's the best way to suppress core dump files from jobs?

Reply via email to