Specifying --mem to Slurm only tells it to find a node that has that much, not 
to enforce a limit as far as I know. That node has that much so it finds it. 
You probably want to enable UsePAM and setup the pam.d slurm files and 
/etc/security/limits.conf to keep users under the 64000MB physical memory that 
the node has (minus some padding for the OS, etc.). IS UsePAM enabled in your 
slurm.conf, maybe that’s doing it.

Best,
Bill.

-- 
Bill Barth, Ph.D., Director, HPC
bba...@tacc.utexas.edu        |   Phone: (512) 232-7069
Office: ROC 1.435            |   Fax:   (512) 475-9445
 
 

On 4/15/18, 2:28 PM, "slurm-users on behalf of Mahmood Naderan" 
<slurm-users-boun...@lists.schedmd.com on behalf of mahmood...@gmail.com> wrote:

    Bill,
    Thing is that both user and root see unlimited virtual memory when
    they directly ssh to the node. However, when the job is submitted, the
    user limits change. That means, slurm modifies something.
    
    The script is
    
    #SBATCH --job-name=hvacSteadyFoam
    #SBATCH --output=hvacSteadyFoam.log
    #SBATCH --ntasks=32
    #SBATCH --time=100:00:00
    #SBATCH --mem=64000M
    ulimit -a
    mpirun hvacSteadyFoam -parallel
    
    
    The physical memory on the node is 64GB, therefore, I specified 64000M
    for --mem. Is that correct? the only thing I am guessing is that --mem
    also modifies virtual memory limit. Though I am not sure.
    
    
    Regards,
    Mahmood
    
    
    
    
    On Sun, Apr 15, 2018 at 11:32 PM, Bill Barth <bba...@tacc.utexas.edu> wrote:
    > Mahmood, sorry to presume. I meant to address the root user and your ssh 
to the node in your example.
    >
    > At our site, we use UsePAM=1 in our slurm.conf, and our /etc/pam.d/slurm 
and slurm.pam files both contain pam_limits.so, so it could be that way for 
you, too. I.e. Slurm could be setting the limits for jobscripts for your users, 
but for root SSHes, where that’s being set by PAM through another config file. 
Also, root’s limits are potentially differently set by PAM (in 
/etc/security/limits.conf) or the kernel at boot time.
    >
    > Finally, users should be careful using ulimit in their job scripts b/c 
that can only change the limits for that shell script process and not across 
nodes. That jobscript appears to only apply to one node, but if they want 
different limits for jobs that span nodes, they may need to use other features 
of SLURM to get them across all  the nodes their job wants (cgroups, perhaps?).
    >
    > Best,
    > Bill.
    >
    > --
    > Bill Barth, Ph.D., Director, HPC
    > bba...@tacc.utexas.edu        |   Phone: (512) 232-7069
    > Office: ROC 1.435            |   Fax:   (512) 475-9445
    >
    
    

Reply via email to