Re: [slurm-users] Automatically setting OMP_NUM_THREADS=SLURM_CPUS_PER_TASK?

2018-03-06 Thread Bill Barth
We do the same at TACC in our base module (which happens to be called “TACC”), and then we document it. Best, Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu| Phone: (512) 232-7069 Office: ROC 1.435| Fax: (512) 475-9445 On 3/6/18, 5:13 PM, "slurm-

Re: [slurm-users] Automatically setting OMP_NUM_THREADS=SLURM_CPUS_PER_TASK?

2018-03-06 Thread Ryan Novosielski
Thanks again! I’d seen the second one but not the first one. > On Mar 6, 2018, at 6:28 PM, Martin Cuma wrote: > > MKL is trying to be flexible as it has different potential levels of > parallelism inside. Having MKL_ and OMP_NUM_THREADS can be beneficial in > programs where you may want to use

Re: [slurm-users] Automatically setting OMP_NUM_THREADS=SLURM_CPUS_PER_TASK?

2018-03-06 Thread Martin Cuma
MKL is trying to be flexible as it has different potential levels of parallelism inside. Having MKL_ and OMP_NUM_THREADS can be beneficial in programs where you may want to use your own OpenMP but restrict MKL's or vice versa. A good article on the different options that MKL provides is here:

Re: [slurm-users] Automatically setting OMP_NUM_THREADS=SLURM_CPUS_PER_TASK?

2018-03-06 Thread Ryan Novosielski
Thanks, Martin — I almost mentioned Utah in my original e-mail as I turned up your support page in a search. It is good to know definitively that MKL honors that variable — would be preferable to having to know about various different ones. > On Mar 6, 2018, at 6:07 PM, Martin Cuma wrote: > >

Re: [slurm-users] Automatically setting OMP_NUM_THREADS=SLURM_CPUS_PER_TASK?

2018-03-06 Thread Martin Cuma
Ryan, we set OMP_NUM_THREADS=1 in the R and Python modules (MKL will honor that), and instruct those users that want to run multi-threaded to set OMP_NUM_THREADS themselves after loading the module - and make sure they don't oversubscribe the node. In our experience majority of R and Python

[slurm-users] Automatically setting OMP_NUM_THREADS=SLURM_CPUS_PER_TASK?

2018-03-06 Thread Ryan Novosielski
Hi SLURM users, Software compiled against the MKL, like R or Python with NumPy/SciPy compiled against MKL, or probably many other examples present a problem for someone who is making choices via the scheduler which then the software does not respect. Our most recent example is that someone is r

[slurm-users] "stepd terminated due to job not ending with signals"

2018-03-06 Thread Keith Ball
Hi All, I am having an issue with jobs that end, either by an "scancel", or being killed due to job wall time timeout, or even in with srun --pty interactive shell), exiting the shell. An excerpt from /var/log/slurmd where a typical job was running: [2018-03-05T12:48:49.165] _run_prolog: run job

[slurm-users] Overriding 67 Million Job UID limit

2018-03-06 Thread Ron Golberg
Hi all, I would like to implement Slurm in my current HPC system. I have many Jobs divided into job arrays - which makes me cross the Slurm’s 67 Million JOBuid limit. I've looked into the source code and it looks like the ID’s are being reused (67 Mil jobs cycle) but Slurm can handle identical ID