Mahmood, sorry to presume. I meant to address the root user and your ssh to the node in your example.
At our site, we use UsePAM=1 in our slurm.conf, and our /etc/pam.d/slurm and slurm.pam files both contain pam_limits.so, so it could be that way for you, too. I.e. Slurm could be setting the limits for jobscripts for your users, but for root SSHes, where that’s being set by PAM through another config file. Also, root’s limits are potentially differently set by PAM (in /etc/security/limits.conf) or the kernel at boot time. Finally, users should be careful using ulimit in their job scripts b/c that can only change the limits for that shell script process and not across nodes. That jobscript appears to only apply to one node, but if they want different limits for jobs that span nodes, they may need to use other features of SLURM to get them across all the nodes their job wants (cgroups, perhaps?). Best, Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu | Phone: (512) 232-7069 Office: ROC 1.435 | Fax: (512) 475-9445 On 4/15/18, 1:41 PM, "slurm-users on behalf of Mahmood Naderan" <slurm-users-boun...@lists.schedmd.com on behalf of mahmood...@gmail.com> wrote: Excuse me... I think the problem is not pam.d. How do you interpret the following output? [hamid@rocks7 case1_source2]$ sbatch slurm_script.sh Submitted batch job 53 [hamid@rocks7 case1_source2]$ tail -f hvacSteadyFoam.log max memory size (kbytes, -m) 65536000 open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 4096 virtual memory (kbytes, -v) 72089600 file locks (-x) unlimited ^C [hamid@rocks7 case1_source2]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 53 CLUSTER hvacStea hamid R 0:27 1 compute-0-3 [hamid@rocks7 case1_source2]$ ssh compute-0-3 Warning: untrusted X11 forwarding setup failed: xauth key data not generated Last login: Sun Apr 15 23:03:29 2018 from rocks7.local Rocks Compute Node Rocks 7.0 (Manzanita) Profile built 19:21 11-Apr-2018 Kickstarted 19:37 11-Apr-2018 [hamid@compute-0-3 ~]$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 256712 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 4096 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited [hamid@compute-0-3 ~]$ As you can see, the log file where I put "ulimit -a" before the main command says limited virtual memory. However, when I login to the node, it says unlimited! Regards, Mahmood On Sun, Apr 15, 2018 at 11:01 PM, Bill Barth <bba...@tacc.utexas.edu> wrote: > Are you using pam_limits.so in any of your /etc/pam.d/ configuration files? That would be enforcing /etc/security/limits.conf for all users which are usually unlimited for root. Root’s almost always allowed to do stuff bad enough to crash the machine or run it out of resources. If the /etc/pam.d/sshd file has pam_limits.so in it, that’s probably where the unlimited setting for root is coming from. > > Best, > Bill.