Re: [slurm-users] lmod and slurm

2017-12-20 Thread Yair Yarom
Thank you all for your advises and insights. I understand that a fair portion of my time is spent on helping the users. However, in cases were the error repeats and I need to re-explain it to a different user each time - I tend to believe there's something wrong with the system configuration. And

Re: [slurm-users] lmod and slurm

2017-12-20 Thread Bill Barth
Yair, You may want to look at using “module reset” rather than a plain purge. Also, the environment variable LMOD_SYSTEM_DEFAULT_MODULES takes a colon separated list of “default” modules and reset does a purge followed by an automatic load of that list of modules. We set that variable in our ba

Re: [slurm-users] lmod and slurm

2017-12-19 Thread Gerry Creager - NOAA Affiliate
I have to echo Loris' comments. My users tend to experiment, and a fair portion of my time is spent helping them correct errors they've inflicted upon themselves. I tend to provide guides for configuring and running our more usual applications, and then when they fail, I review the guidance with th

Re: [slurm-users] lmod and slurm

2017-12-19 Thread Loris Bennett
Yair Yarom writes: > There are two issues: > > 1. For the manually loaded modules by users, we can (and are) >instructing them to load the modules within their sbatch scripts. The >problem is that not all users read the documentation properly, so in >the tensorflow example, they use t

Re: [slurm-users] lmod and slurm

2017-12-19 Thread Yair Yarom
There are two issues: 1. For the manually loaded modules by users, we can (and are) instructing them to load the modules within their sbatch scripts. The problem is that not all users read the documentation properly, so in the tensorflow example, they use the cpu version of tensorflow

Re: [slurm-users] lmod and slurm

2017-12-19 Thread Vanzo, Davide
Is there a way to configure Slurm not to export the environment of the submission node by default? -- Davide Vanzo, PhD Application Developer Adjunct Assistant Professor of Chemical and Biomolecular Engineering Advanced Computing Center for Research and Education (ACCRE) www.accre.vanderbilt.edu

Re: [slurm-users] lmod and slurm

2017-12-19 Thread Jeffrey Frey
Don't propagate the submission environment: srun --export=NONE myprogram > On Dec 19, 2017, at 8:37 AM, Yair Yarom wrote: > > > Thanks for your reply, > > The problem is that users are running on the submission node e.g. > > module load tensorflow > srun myprogram > > So they get the tens

Re: [slurm-users] lmod and slurm

2017-12-19 Thread Loris Bennett
Hi Yair, Can't the users just use sbatch and put #SBATCH --constraint=shiny_and_new' module purge module add ${SLURM_CONSTRAINT} srun myprogram in their batch scripts? Loris Yair Yarom writes: > Thanks for your reply, > > The problem is that users are running on the submission node

Re: [slurm-users] lmod and slurm

2017-12-19 Thread Yair Yarom
Thanks for your reply, The problem is that users are running on the submission node e.g. module load tensorflow srun myprogram So they get the tensorflow version (and PATH/PYTHONPATH) of the submission node's version of tensorflow (and any additional default modules). There is never a chance t

Re: [slurm-users] lmod and slurm

2017-12-19 Thread Loris Bennett
Hi Yair, Yair Yarom writes: > Hi list, > > We use here lmod[1] for some software/version management. There are two > issues encountered (so far): > > 1. The submission node can have different software than the execution >nodes - different cpu, different gpu (if any), infiniband, etc. When >