Hi Mihai,
yes, it's the same problem: when you run
srun echo $CUDA_VISIBLE_DEVICES
the value of $CUDA_VISIBLE_DEVICES on the first of the two nodes is
substituted into the line *before* srun is called.
srun bash -c 'echo $CUDA_VISIBLE_DEVICES'
is the way to go.
BTW: the job-script I am
Dear Hermann,
Sorry to come back to you, but just to understand...if I run the
following script:
#!/bin/bash
#SBATCH --partition=gpu
#SBATCH --time=24:00:00
#SBATCH --nodes=2
#SBATCH --exclusive
#SBATCH --job-name="test_job"
#SBATCH -o stdout_%j
#SBATCH -e stderr_%j
touch test.txt
# Print
Dear Hermann,
Thank you for the clarifications and for the quick answer!
Best wishes,
Mihai
On 2024-05-28 13:31, Hermann Schwärzler wrote:
Dear Mihai,
you are not asking Slurm to provide you with any GPUs:
#SBATCH --gpus=12
So it doesn't reserve any for you and as a consequence also d
Dear Mihai,
you are not asking Slurm to provide you with any GPUs:
#SBATCH --gpus=12
So it doesn't reserve any for you and as a consequence also does not set
CUDA_VISIBLE_DEVICES for you.
nvidia-smi works, because it looks like you are not using cgroups at all
or at least not "Constrai
Dear Hermann,
Dear James,
Thank you both for your answers!
I have tried as you suggested using bash -c and it worked.
But when I'm trying the following script the "bash -c" trick doesn't
work:
#!/bin/bash
#SBATCH --partition=eli
#SBATCH --time=24:00:00
#SBATCH --nodelist=mihaigpu2,mihai-x86
Hi Mihai,
this is a problem that is not Slurm related. It's rather about:
"when does command substitution happen?"
When you write
srun echo Running on host: $(hostname)
$(hostname) is replaced by the output of the hostname-command *before*
the line is "submitted" to srun. Which means that s