Hi Yair, Can't the users just use sbatch and put
#SBATCH --constraint=shiny_and_new' module purge module add ${SLURM_CONSTRAINT} srun myprogram in their batch scripts? Loris Yair Yarom <ir...@cs.huji.ac.il> writes: > Thanks for your reply, > > The problem is that users are running on the submission node e.g. > > module load tensorflow > srun myprogram > > So they get the tensorflow version (and PATH/PYTHONPATH) of the > submission node's version of tensorflow (and any additional default > modules). > > There is never a chance to run the "module add ${SLURM_CONSTRAINT}" or > remove the unwanted modules that were loaded (maybe automatically) on > the submission node and aren't working on the execution node. > > Thanks, > Yair. > > On Tue, Dec 19 2017, "Loris Bennett" <loris.benn...@fu-berlin.de> wrote: > >> Hi Yair, >> >> Yair Yarom <ir...@cs.huji.ac.il> writes: >> >>> Hi list, >>> >>> We use here lmod[1] for some software/version management. There are two >>> issues encountered (so far): >>> >>> 1. The submission node can have different software than the execution >>> nodes - different cpu, different gpu (if any), infiniband, etc. When >>> a user runs 'module load something' on the submission node, it will >>> pass the wrong environment to the task in the execution >>> node. e.g. "module load tensorflow" can load a different version >>> depending on the nodes. >>> >>> 2. There are some modules we want to load by default, and again this can >>> be different between nodes (we do this by source'ing /etc/lmod/lmodrc >>> and ~/.lmodrc). >>> >>> For issue 1, we instruct users to run the "module load" in their batch >>> script and not before running sbatch, but issue 2 is more problematic. >>> >>> My current solution is to write a TaskProlog script that runs "module >>> purge" and "module load" and export/unset the changed environment >>> variables. I was wondering if anyone encountered this issue and have a >>> less cumbersome solution. >>> >>> Thanks in advance, >>> Yair. >>> >>> [1] https://www.tacc.utexas.edu/research-development/tacc-projects/lmod >> >> I don't fully understand your use-case, but, assuming you can divide >> your nodes up by some feature, could you define a module per feature >> which just loads the specific modules needed for that category, e.g. in >> the batch file you would have >> >> #SBATCH --constraint=shiny_and_new >> >> module add ${SLURM_CONSTRAINT} >> >> and would have a module file 'shiny_and_new', with contents like, say, >> >> module add tensorflow/2.0 >> module add cuda/9.0 >> >> whereas the module 'rusty_and_old' would contain >> >> module add tensorflow/0.1 >> module add cuda/0.2 >> >> Would that help? >> >> Cheers, >> >> Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de