Hi Yair, Yair Yarom <ir...@cs.huji.ac.il> writes:
> Hi list, > > We use here lmod[1] for some software/version management. There are two > issues encountered (so far): > > 1. The submission node can have different software than the execution > nodes - different cpu, different gpu (if any), infiniband, etc. When > a user runs 'module load something' on the submission node, it will > pass the wrong environment to the task in the execution > node. e.g. "module load tensorflow" can load a different version > depending on the nodes. > > 2. There are some modules we want to load by default, and again this can > be different between nodes (we do this by source'ing /etc/lmod/lmodrc > and ~/.lmodrc). > > For issue 1, we instruct users to run the "module load" in their batch > script and not before running sbatch, but issue 2 is more problematic. > > My current solution is to write a TaskProlog script that runs "module > purge" and "module load" and export/unset the changed environment > variables. I was wondering if anyone encountered this issue and have a > less cumbersome solution. > > Thanks in advance, > Yair. > > [1] https://www.tacc.utexas.edu/research-development/tacc-projects/lmod I don't fully understand your use-case, but, assuming you can divide your nodes up by some feature, could you define a module per feature which just loads the specific modules needed for that category, e.g. in the batch file you would have #SBATCH --constraint=shiny_and_new module add ${SLURM_CONSTRAINT} and would have a module file 'shiny_and_new', with contents like, say, module add tensorflow/2.0 module add cuda/9.0 whereas the module 'rusty_and_old' would contain module add tensorflow/0.1 module add cuda/0.2 Would that help? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de