Hi Sam,

this is expected and how bash works.

Regarding the #SBATCH --output problem this seems to be an error, because only one output file is created (I just tested it myself).


Regarding variable substitution:

srun echo SLURMD_NODENAME:$SLURMD_NODENAME SLURM_ARRAY_TASK_ID:$SLURM_ARRAY_TASK_ID SLURM_ARRAY_JOB_ID:$SLURM_ARRAY_JOB_ID SLURM_JOB_ID:$SLURM_JOB_ID SLURM_TASK_PID:$SLURM_TASK_PID

bash evaluates the variables before the actual program is started, otherwise e.g. "cd $HOME" would not work, because in most unixoid systems $HOME never exist, but the variable HOME would point to the user's home directory.

So, in fact, here's what you're letting go:

srun echo SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID: SLURM_JOB_ID:2056 SLURM_TASK_PID:644
This is exactly the output you received.


Here's what you could try:

srun echo 'SLURMD_NODENAME:$SLURMD_NODENAME SLURM_ARRAY_TASK_ID:$SLURM_ARRAY_TASK_ID SLURM_ARRAY_JOB_ID:$SLURM_ARRAY_JOB_ID SLURM_JOB_ID:$SLURM_JOB_ID_LURM_TASK_PID_TASK_PID'.

the single quotes (no backticks!) should prevent bash from replacing the variables.


Best
Marcus


On 07/13/2018 06:54 PM, Sam wrote:

StackOverflow Thread: https://stackoverflow.com/questions/51328917/slurm-sbatch-multiple-nodes-same-slurmd-nodename


possibly related to:
https://groups.google.com/forum/#!topic/slurm-users/suclnO2V0aA <https://groups.google.com/forum/#%21topic/slurm-users/suclnO2V0aA>

 - slurm-wlm 17.11.2
 - Installed from Ubuntu Apt repos, Ubuntu:18.04

We have a cluster of 20 identical nodes.
Running the simple script below give me a confusing problem.
All the jobs think they are running on node3, while running the hostname command gives the accurate answer. This is also a problem for the output filename. I expected to have many different outputs, but I get just one, with 'node3' in the filename. This seems to be a Bash Eval() / Variable substitution error.
Wrapping

$SLURMD_NODENAME

in a

  bash -c "echo Bash3: \$SLURMD_NODENAME"

works. But why did I have to do this? This workaround won't work for the #SBATCH --output

cn.job:

  #!/bin/bash
  #SBATCH --output=/share/output.txt.%j.%J.%a.%A.%n.%N.%s.%t.%x
  #SBATCH --time=00:00:30
  #SBATCH --tasks-per-node=2
  #SBATCH --nodes=4
  srun hostname
  srun bash -c "echo Bash2: \$(hostname)"
  srun echo SLURMD_NODENAME:$SLURMD_NODENAME SLURM_ARRAY_TASK_ID:$SLURM_ARRAY_TASK_ID SLURM_ARRAY_JOB_ID:$SLURM_ARRAY_JOB_ID SLURM_JOB_ID:$SLURM_JOB_ID SLURM_TASK_PID:$SLURM_TASK_PID
  srun bash -c "echo Bash3: \$SLURMD_NODENAME"
  srun sleep 20

Ran like:

  sbatch cn.job

produces this output:

**/share/output.txt.2056.2056.4294967294.2056.0.node3.4294967294.0.cn.job**

  node3
  node3
  node6
  node4
  node5
  node6
  node4
  node5
  Bash2: node3
  Bash2: node6
  Bash2: node4
  Bash2: node5
  Bash2: node3
  Bash2: node4
  Bash2: node6
  Bash2: node5
  SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID: SLURM_JOB_ID:2056 SLURM_TASK_PID:6441   SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID: SLURM_JOB_ID:2056 SLURM_TASK_PID:6441   SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID: SLURM_JOB_ID:2056 SLURM_TASK_PID:6441   SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID: SLURM_JOB_ID:2056 SLURM_TASK_PID:6441   SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID: SLURM_JOB_ID:2056 SLURM_TASK_PID:6441   SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID: SLURM_JOB_ID:2056 SLURM_TASK_PID:6441   SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID: SLURM_JOB_ID:2056 SLURM_TASK_PID:6441   SLURMD_NODENAME:node3 SLURM_ARRAY_TASK_ID: SLURM_ARRAY_JOB_ID: SLURM_JOB_ID:2056 SLURM_TASK_PID:6441
  Bash3: node3
  Bash3: node5
  Bash3: node3
  Bash3: node4
  Bash3: node6
  Bash3: node4
  Bash3: node6
  Bash3: node5


--
Marcus Wagner, Dipl.-Inf.

IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de

Reply via email to