Thank you all for your advises and insights.
I understand that a fair portion of my time is spent on helping the
users. However, in cases were the error repeats and I need to re-explain
it to a different user each time - I tend to believe there's something
wrong with the system configuration. And
Yair,
You may want to look at using “module reset” rather than a plain purge. Also,
the environment variable LMOD_SYSTEM_DEFAULT_MODULES takes a colon separated
list of “default” modules and reset does a purge followed by an automatic load
of that list of modules. We set that variable in our ba
I have to echo Loris' comments. My users tend to experiment, and a fair
portion of my time is spent helping them correct errors they've inflicted
upon themselves. I tend to provide guides for configuring and running our
more usual applications, and then when they fail, I review the guidance
with th
Yair Yarom writes:
> There are two issues:
>
> 1. For the manually loaded modules by users, we can (and are)
>instructing them to load the modules within their sbatch scripts. The
>problem is that not all users read the documentation properly, so in
>the tensorflow example, they use t
There are two issues:
1. For the manually loaded modules by users, we can (and are)
instructing them to load the modules within their sbatch scripts. The
problem is that not all users read the documentation properly, so in
the tensorflow example, they use the cpu version of tensorflow
Is there a way to configure Slurm not to export the environment of the
submission node by default?
--
Davide Vanzo, PhD
Application Developer
Adjunct Assistant Professor of Chemical and Biomolecular Engineering
Advanced Computing Center for Research and Education (ACCRE)
www.accre.vanderbilt.edu
Don't propagate the submission environment:
srun --export=NONE myprogram
> On Dec 19, 2017, at 8:37 AM, Yair Yarom wrote:
>
>
> Thanks for your reply,
>
> The problem is that users are running on the submission node e.g.
>
> module load tensorflow
> srun myprogram
>
> So they get the tens
Hi Yair,
Can't the users just use sbatch and put
#SBATCH --constraint=shiny_and_new'
module purge
module add ${SLURM_CONSTRAINT}
srun myprogram
in their batch scripts?
Loris
Yair Yarom writes:
> Thanks for your reply,
>
> The problem is that users are running on the submission node
Thanks for your reply,
The problem is that users are running on the submission node e.g.
module load tensorflow
srun myprogram
So they get the tensorflow version (and PATH/PYTHONPATH) of the
submission node's version of tensorflow (and any additional default
modules).
There is never a chance t
Hi Yair,
Yair Yarom writes:
> Hi list,
>
> We use here lmod[1] for some software/version management. There are two
> issues encountered (so far):
>
> 1. The submission node can have different software than the execution
>nodes - different cpu, different gpu (if any), infiniband, etc. When
>
10 matches
Mail list logo