Hi Sam,
this is expected and how bash works.
Regarding the #SBATCH --output problem this seems to be an error,
because only one output file is created (I just tested it myself).
Regarding variable substitution:
srun echo SLURMD_NODENAME:$SLURMD_NODENAME
SLURM_ARRAY_TASK_ID:$SLURM_ARRAY_TASK_ID
SLURM_ARRAY_JOB_ID:$SLURM_ARRAY_JOB_ID SLURM_JOB_ID:$SLURM_JOB_ID
SLURM_TASK_PID:$SLURM_TASK_PID
bash evaluates the variables before the actual program is started,
otherwise e.g. "cd $HOME" would not work, because in most unixoid
systems $HOME never exist, but the variable HOME would point to the
user's home directory.
So, in fact, here's what you're letting go:
srun echo SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:644
This is exactly the output you received.
Here's what you could try:
srun echo 'SLURMD_NODENAME:$SLURMD_NODENAME
SLURM_ARRAY_TASK_ID:$SLURM_ARRAY_TASK_ID
SLURM_ARRAY_JOB_ID:$SLURM_ARRAY_JOB_ID
SLURM_JOB_ID:$SLURM_JOB_ID_LURM_TASK_PID_TASK_PID'.
the single quotes (no backticks!) should prevent bash from replacing the
variables.
Best
Marcus
On 07/13/2018 06:54 PM, Sam wrote:
StackOverflow Thread:
https://stackoverflow.com/questions/51328917/slurm-sbatch-multiple-nodes-same-slurmd-nodename
possibly related to:
https://groups.google.com/forum/#!topic/slurm-users/suclnO2V0aA
<https://groups.google.com/forum/#%21topic/slurm-users/suclnO2V0aA>
- slurm-wlm 17.11.2
- Installed from Ubuntu Apt repos, Ubuntu:18.04
We have a cluster of 20 identical nodes.
Running the simple script below give me a confusing problem.
All the jobs think they are running on node3, while running the
hostname command gives the accurate answer. This is also a problem for
the output filename. I expected to have many different outputs, but I
get just one, with 'node3' in the filename. This seems to be a Bash
Eval() / Variable substitution error.
Wrapping
$SLURMD_NODENAME
in a
bash -c "echo Bash3: \$SLURMD_NODENAME"
works. But why did I have to do this? This workaround won't work for
the #SBATCH --output
cn.job:
#!/bin/bash
#SBATCH --output=/share/output.txt.%j.%J.%a.%A.%n.%N.%s.%t.%x
#SBATCH --time=00:00:30
#SBATCH --tasks-per-node=2
#SBATCH --nodes=4
srun hostname
srun bash -c "echo Bash2: \$(hostname)"
srun echo SLURMD_NODENAME:$SLURMD_NODENAME
SLURM_ARRAY_TASK_ID:$SLURM_ARRAY_TASK_ID
SLURM_ARRAY_JOB_ID:$SLURM_ARRAY_JOB_ID SLURM_JOB_ID:$SLURM_JOB_ID
SLURM_TASK_PID:$SLURM_TASK_PID
srun bash -c "echo Bash3: \$SLURMD_NODENAME"
srun sleep 20
Ran like:
sbatch cn.job
produces this output:
**/share/output.txt.2056.2056.4294967294.2056.0.node3.4294967294.0.cn.job**
node3
node3
node6
node4
node5
node6
node4
node5
Bash2: node3
Bash2: node6
Bash2: node4
Bash2: node5
Bash2: node3
Bash2: node4
Bash2: node6
Bash2: node5
SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID:
SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
Bash3: node3
Bash3: node5
Bash3: node3
Bash3: node4
Bash3: node6
Bash3: node4
Bash3: node6
Bash3: node5
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de