Is there a way to configure Slurm not to export the environment of the 
submission node by default?

--
Davide Vanzo, PhD
Application Developer
Adjunct Assistant Professor of Chemical and Biomolecular Engineering
Advanced Computing Center for Research and Education (ACCRE)
www.accre.vanderbilt.edu


On 2017-12-19 08:12:39-06:00 Jeffrey Frey wrote:

Don't propagate the submission environment:

srun --export=NONE myprogram



> On Dec 19, 2017, at 8:37 AM, Yair Yarom <ir...@cs.huji.ac.il> 
wrote:
>
>
> Thanks for your reply,
>
> The problem is that users are running on the submission node e.g.
>
> module load tensorflow
> srun myprogram
>
> So they get the tensorflow version (and PATH/PYTHONPATH) of the
> submission node's version of tensorflow (and any additional default
> modules).
>
> There is never a chance to run the "module add ${SLURM_CONSTRAINT}" or
> remove the unwanted modules that were loaded (maybe automatically) on
> the submission node and aren't working on the execution node.
>
> Thanks,
>    Yair.
>
> On Tue, Dec 19 2017, "Loris Bennett" 
<loris.benn...@fu-berlin.de> wrote:
>
>> Hi Yair,
>>
>> Yair Yarom <ir...@cs.huji.ac.il> writes:
>>
>>> Hi list,
>>>
>>> We use here lmod[1] for some software/version 
management. There are two
>>> issues encountered (so far):
>>>
>>> 1. The submission node can have different software 
than the execution
>>>   nodes - different cpu, different gpu (if any), 
infiniband, etc. When
>>>   a user runs 'module load something' on the 
submission node, it will
>>>   pass the wrong environment to the task in the 
execution
>>>   node. e.g. "module load tensorflow" can load a 
different version
>>>   depending on the nodes.
>>>
>>> 2. There are some modules we want to load by default, 
and again this can
>>>   be different between nodes (we do this by source'ing 
/etc/lmod/lmodrc
>>>   and ~/.lmodrc).
>>>
>>> For issue 1, we instruct users to run the "module 
load" in their batch
>>> script and not before running sbatch, but issue 2 is 
more problematic.
>>>
>>> My current solution is to write a TaskProlog script 
that runs "module
>>> purge" and "module load" and export/unset the changed 
environment
>>> variables. I was wondering if anyone encountered this 
issue and have a
>>> less cumbersome solution.
>>>
>>> Thanks in advance,
>>>    Yair.
>>>
>>> [1] 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.tacc.utexas.edu%2Fresearch-development%2Ftacc-projects%2Flmod&data=02%7C01%7Cdavide.vanzo%40vanderbilt.edu%7C0ea39bfde2134f5d08ad08d546ea871c%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636492895581644581&sdata=bG2SKduxy19tfm52%2Boma59eYSHyi798arSmnOiS1x64%3D&reserved=0
>>
>> I don't fully understand your use-case, but, assuming you can 
divide
>> your nodes up by some feature, could you define a module per 
feature
>> which just loads the specific modules needed for that 
category, e.g. in
>> the batch file you would have
>>
>>   #SBATCH --constraint=shiny_and_new
>>
>>   module add ${SLURM_CONSTRAINT}
>>
>> and would have a module file 'shiny_and_new', with contents 
like, say,
>>
>>  module add tensorflow/2.0
>>  module add cuda/9.0
>>
>> whereas the module 'rusty_and_old' would contain
>>
>>  module add tensorflow/0.1
>>  module add cuda/0.2
>>
>> Would that help?
>>
>> Cheers,
>>
>> Loris
>


::::::::::::::::::::::::::::::::::::::::::::::::::::::
Jeffrey T. Frey, Ph.D.
Systems Programmer V / HPC Management
Network & Systems Services / College of Engineering
University of Delaware, Newark DE  19716
Office: (302) 831-6034  Mobile: (302) 419-4976
::::::::::::::::::::::::::::::::::::::::::::::::::::::





</ir...@cs.huji.ac.il></loris.benn...@fu-berlin.de></ir...@cs.huji.ac.il>

Reply via email to