[slurm-users] Slurm version 18.08.0 is now available

2018-08-30 Thread Tim Wickberg
After 9 months of development and testing we are pleased to announce the availability of Slurm version 18.08.0! Downloads are available from https://www.schedmd.com/downloads.php. Thank you to all customers, partners, and community members who contributed to getting this release done. A list o

Re: [slurm-users] how can users start their worker daemons using srun?

2018-08-30 Thread Brian W. Johanson
On 08/29/2018 04:59 PM, Chris Samuel wrote: On Thursday, 30 August 2018 12:45:51 AM AEST Brian W. Johanson wrote: In your example, you do not have enough memory for both sruns at the same time. Nice spot, I think I was thinking in mem-per-task (which doesn't exist) then! Unfortunately fixing

Re: [slurm-users] [External] Re: Whether I can replace value of the variable when use srun

2018-08-30 Thread Chaofeng Zhang
Sorry, for the variable replacement, for 17.02, even I don’t set CUDA_VISIBLE_DEVICES=NoDevFiles in the srun, it is the same result. From: slurm-users On Behalf Of Chaofeng Zhang Sent: Friday, August 31, 2018 12:16 AM To: Slurm User Community List Subject: [External] Re: [slurm-users] Wheth

Re: [slurm-users] Whether I can replace value of the variable when use srun

2018-08-30 Thread Chaofeng Zhang
I found it is a bug in slurm 17.11.7, if I run the same command in 17.02, it can be replaced, the below is the command run under slurm 17.02 [root@head ~]# export CUDA_VISIBLE_DEVICES=0,1 [root@head ~]# srun -N1 -n1 --nodelist=head --export=CUDA_VISIBLE_DEVICES=NoDevFiles,ALL env|grep CUDA CUDA_

[slurm-users] Whether I can replace value of the variable when use srun

2018-08-30 Thread Chaofeng Zhang
export CUDA_VISIBLE_DEVICES=0,1 srun -N1 -n1 --nodelist=head --export=CUDA_VISIBLE_DEVICES=NoDevFiles,ALL env|grep CUDA The srun result is CUDA_VISIBLE_DEVICES=0,1, how could I replace CUDA_VISIBLE_DEVICES with NoDevFiles. Thanks.

Re: [slurm-users] [External] Re: serious bug about CUDA_VISBLE_DEVICES in the slurm 17.11.7

2018-08-30 Thread Chaofeng Zhang
$ export CUDA_VISIBLE_DEVICES=0,1; srun -N 1 -n 1 --gres=none -p GPU /usr/bin/env |grep CUDA CUDA_VISIBLE_DEVICES=0,1 This result should be CUDA_VISIBLE_DEVICES=NoDevFiles, and it really is NoDevFiles in 17.02. So this must be a bug in 17.11.7. From: slurm-users On Behalf Of Brian W. Johanso

Re: [slurm-users] [External] Re: serious bug about CUDA_VISBLE_DEVICES in the slurm 17.11.7

2018-08-30 Thread Chaofeng Zhang
My case, gpu resource is defined in the job file #SBATCH --gres=gpu:2, so when I using srun, the CUDA_VISBLE_DEVICES=0,1 is already in the shell, I just want to set CUDA_VISIBLE_DEVICES=NoDevFiles in one specific srun, it can not work in the 17.11.7. But it work in 17.02 #!/bin/bash #SBATCH

Re: [slurm-users] [External] Re: serious bug about CUDA_VISBLE_DEVICES in the slurm 17.11.7

2018-08-30 Thread Brian W. Johanson
and to answer "CUDA_VISBLE_DEVICES can't be set NoDevFiles in Slurm 17.11.7" CUDA_VISIBLE_DEVICES is unset if --gres=none and if set in the user's environment, it will remains set to whatever.  If you want really want to see NoDevFIles, set it in /etc/profile.d, it will get clobbered when the r

Re: [slurm-users] SGE to Slurm functionality not supported

2018-08-30 Thread Ryan Cox
sbatch --wrap="command --args" is similar to what you're looking for. Ryan On 08/30/2018 09:12 AM, Anson Abraham wrote: In Sun Grid Engine, there's an option (parameter) of -b "Gives the user the possibility to indicate explicitly whether command should be treated as binary or script. If the v

[slurm-users] SGE to Slurm functionality not supported

2018-08-30 Thread Anson Abraham
In Sun Grid Engine, there's an option (parameter) of -b "Gives the user the possibility to indicate explicitly whether command should be treated as binary or script. If the value of -b is 'y', then command may be a binary or script. " I can not find that equivalent for slurm. is there an option

Re: [slurm-users] [External] Re: serious bug about CUDA_VISBLE_DEVICES in the slurm 17.11.7

2018-08-30 Thread John Hearns
I also remember there being write-only permissions involved when working with cgroups and devices .. which bent my head slightly.. On Thu, 30 Aug 2018 at 17:02, John Hearns wrote: > Chaofeng, I agree with what Chris says. You should be using cgroups. > > I did a lot of work with cgroups anf GPU

Re: [slurm-users] [External] Re: serious bug about CUDA_VISBLE_DEVICES in the slurm 17.11.7

2018-08-30 Thread John Hearns
Chaofeng, I agree with what Chris says. You should be using cgroups. I did a lot of work with cgroups anf GPUs in PBSPro (yes I know... splitter!) With cgroups you only get access to the devices which are allocated to that cgroup, and you get CUDA_VISIBLE_DEVICES set for you. Remember also to lo

Re: [slurm-users] [External] Re: serious bug about CUDA_VISBLE_DEVICES in the slurm 17.11.7

2018-08-30 Thread Renfro, Michael
Chris’ method will set CUDA_VISIBLE_DEVICES like you’re used to, and it will help keep you or your users from picking conflicting devices. My cgroup/GPU settings from slurm.conf: = [renfro@login ~]$ egrep -i '(cgroup|gpu)' /etc/slurm/slurm.conf | grep -v '^#' ProctrackType=proctrack/cgroup

Re: [slurm-users] [External] Re: serious bug about CUDA_VISBLE_DEVICES in the slurm 17.11.7

2018-08-30 Thread Chaofeng Zhang
CUDA_VISBLE_DEVICES is used by many AI framework to determine which gpu to use, like tensorflow. So this environment is critical to us. -Original Message- From: slurm-users On Behalf Of Chris Samuel Sent: Thursday, August 30, 2018 4:42 PM To: slurm-users@lists.schedmd.com Subject: [Exte

Re: [slurm-users] serious bug about CUDA_VISBLE_DEVICES in the slurm 17.11.7

2018-08-30 Thread Chris Samuel
On Thursday, 30 August 2018 6:38:08 PM AEST Chaofeng Zhang wrote: > The CUDA_VISBLE_DEVICES can't be set NoDevFiles in Slurm 17.11.7. This is > worked when we use Slurm 17.02. You probably should be using cgroups instead to constrain access to GPUs. Then it doesn't matter what you set CUDA_VIS

[slurm-users] serious bug about CUDA_VISBLE_DEVICES in the slurm 17.11.7

2018-08-30 Thread Chaofeng Zhang
The CUDA_VISBLE_DEVICES can't be set NoDevFiles in Slurm 17.11.7. This is worked when we use Slurm 17.02. Slurm 17.02: [root@head ~]# export CUDA_VISIBLE_DEVICES=0,1 [root@head ~]# srun -N1 -n1 --gres=none --nodelist=head /usr/bin/env|grep CUDA CUDA_HOME=/usr/local/cuda CUDA_VISIBLE_DEVICES=NoD