[slurm-users] Re: sbatch problem

2024-05-29 Thread Hermann Schwärzler via slurm-users
Hi Mihai, yes, it's the same problem: when you run srun echo $CUDA_VISIBLE_DEVICES the value of $CUDA_VISIBLE_DEVICES on the first of the two nodes is substituted into the line *before* srun is called. srun bash -c 'echo $CUDA_VISIBLE_DEVICES' is the way to go. BTW: the job-script I am

[slurm-users] Re: sbatch problem

2024-05-29 Thread Mihai Ciubancan via slurm-users
Dear Hermann, Sorry to come back to you, but just to understand...if I run the following script: #!/bin/bash #SBATCH --partition=gpu #SBATCH --time=24:00:00 #SBATCH --nodes=2 #SBATCH --exclusive #SBATCH --job-name="test_job" #SBATCH -o stdout_%j #SBATCH -e stderr_%j touch test.txt # Print

[slurm-users] Re: sbatch problem

2024-05-28 Thread Mihai Ciubancan via slurm-users
Dear Hermann, Thank you for the clarifications and for the quick answer! Best wishes, Mihai On 2024-05-28 13:31, Hermann Schwärzler wrote: Dear Mihai, you are not asking Slurm to provide you with any GPUs: #SBATCH --gpus=12 So it doesn't reserve any for you and as a consequence also d

[slurm-users] Re: sbatch problem

2024-05-28 Thread Hermann Schwärzler via slurm-users
Dear Mihai, you are not asking Slurm to provide you with any GPUs: #SBATCH --gpus=12 So it doesn't reserve any for you and as a consequence also does not set CUDA_VISIBLE_DEVICES for you. nvidia-smi works, because it looks like you are not using cgroups at all or at least not "Constrai

[slurm-users] Re: sbatch problem

2024-05-28 Thread Mihai Ciubancan via slurm-users
Dear Hermann, Dear James, Thank you both for your answers! I have tried as you suggested using bash -c and it worked. But when I'm trying the following script the "bash -c" trick doesn't work: #!/bin/bash #SBATCH --partition=eli #SBATCH --time=24:00:00 #SBATCH --nodelist=mihaigpu2,mihai-x86

[slurm-users] Re: sbatch problem

2024-05-28 Thread Hermann Schwärzler via slurm-users
Hi Mihai, this is a problem that is not Slurm related. It's rather about: "when does command substitution happen?" When you write srun echo Running on host: $(hostname) $(hostname) is replaced by the output of the hostname-command *before* the line is "submitted" to srun. Which means that s