[slurm-users] sbatch problem
Hello, My name is Mihai and a have an issue with a small GPU cluster manage with slurm 22.05.11. I got 2 different output when I'm trying to find out the name of the nodes(one correct and one wrong). The script is: #!/bin/bash #SBATCH --job-name=test #SBATCH --output=/data/mihai/res.txt #SBATCH --partition=eli #SBATCH --nodes=2 srun echo Running on host: $(hostname) srun hostname srun sleep 15 And the output look like this: cat res.txt Running on host: mihai-x8640 Running on host: mihai-x8640 mihaigpu2 mihai-x8640 As you can see the output of the command 'srun echo Running on host: $(hostname)' is the same, as the jobs was running twice on the same node, while command 'srun hostname' it's giving me the correct output. Do you have any idea why the outputs of the 2 commands are different? Thank you, Mihai -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: sbatch problem
Dear Hermann, Dear James, Thank you both for your answers! I have tried as you suggested using bash -c and it worked. But when I'm trying the following script the "bash -c" trick doesn't work: #!/bin/bash #SBATCH --partition=eli #SBATCH --time=24:00:00 #SBATCH --nodelist=mihaigpu2,mihai-x8640 #SBATCH --gpus=12 #SBATCH --exclusive #SBATCH --job-name="test_job" #SBATCH -o /data/mihai/stdout_%j #SBATCH -e /data/mihai/stderr_%j touch test.txt # Print the hostname of the allocated node srun bash -c 'echo Running on host: $(hostname)' # Print the start time echo "Job started at: $(date)" # Perform a simple task that takes a few minutes echo "Starting the task..." sleep 20 srun echo "GPU UUIDs:" srun nvidia-smi --query-gpu=uuid --format=csv,noheader srun bash -c 'echo $CUDA_VISIBLE_DEVICES' ##echo "Task completed." # Print the end time echo "Job finished at: $(date)" I don't get any output of the command srun bash -c 'echo $CUDA_VISIBLE_DEVICES': Running on host: mihaigpu2 Running on host: mihai-x8640 Job started at: Tue May 28 13:02:59 EEST 2024 Starting the task... GPU UUIDs: GPU UUIDs: GPU-d4e002a9-409f-79bb-70e1-56c1a473a188 GPU-33b728e2-0396-368b-b9c3-8f828ca145b1 GPU-7d90f7d8-aadf-ba95-2409-8c57bd40d24b GPU-30faa03a-0782-4b6c-dda2-e108159ba953 GPU-37d09257-2582-8080-223a-dd5a646fba43 GPU-c71cbb10-4368-d327-e0e5-56372aa4f10f GPU-a413a75a-15b2-063e-638f-bde063af5c8e GPU-bf12181a-e615-dcd4-5da2-9a518ae1af5d GPU-dfec21c4-e30d-5a36-599d-eef2fd354809 GPU-15a11fe2-33f2-cd65-09f0-9897ba057a0c GPU-2d971e69-8147-8221-a055-e26573950f91 GPU-22ee3c89-fed1-891f-96bb-6bbf27a2cc4b Job finished at: Tue May 28 13:03:20 EEST 2024 ...I'm not interesting on the output of the other 'echo' commands, beside the one with the hostname, that's why I didn't changed. Best, Mihai I will try On 2024-05-28 12:23, Hermann Schwärzler via slurm-users wrote: Hi Mihai, this is a problem that is not Slurm related. It's rather about: "when does command substitution happen?" When you write srun echo Running on host: $(hostname) $(hostname) is replaced by the output of the hostname-command *before* the line is "submitted" to srun. Which means that srun will happily run it on any (remote) node using the name of the host it is running on. If you want to avoid this, one possible solution is srun bash -c 'echo Running on host: $(hostname)' In this case the command substitution is happening after srun starts the process on a (potentially remote) node. Regards, Hermann On 5/28/24 10:54, Mihai Ciubancan via slurm-users wrote: Hello, My name is Mihai and a have an issue with a small GPU cluster manage with slurm 22.05.11. I got 2 different output when I'm trying to find out the name of the nodes(one correct and one wrong). The script is: #!/bin/bash #SBATCH --job-name=test #SBATCH --output=/data/mihai/res.txt #SBATCH --partition=eli #SBATCH --nodes=2 srun echo Running on host: $(hostname) srun hostname srun sleep 15 And the output look like this: cat res.txt Running on host: mihai-x8640 Running on host: mihai-x8640 mihaigpu2 mihai-x8640 As you can see the output of the command 'srun echo Running on host: $(hostname)' is the same, as the jobs was running twice on the same node, while command 'srun hostname' it's giving me the correct output. Do you have any idea why the outputs of the 2 commands are different? Thank you, Mihai -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: sbatch problem
Dear Hermann, Thank you for the clarifications and for the quick answer! Best wishes, Mihai On 2024-05-28 13:31, Hermann Schwärzler wrote: Dear Mihai, you are not asking Slurm to provide you with any GPUs: #SBATCH --gpus=12 So it doesn't reserve any for you and as a consequence also does not set CUDA_VISIBLE_DEVICES for you. nvidia-smi works, because it looks like you are not using cgroups at all or at least not "ConstrainDevices=yes" in e.g. cgroup.conf. So it "sees" all the GPUs that are installed in the node it's running on even if none is reserved for you by Slurm. Regards, Hermann On 5/28/24 12:07, Mihai Ciubancan wrote: Dear Hermann, Dear James, Thank you both for your answers! I have tried as you suggested using bash -c and it worked. But when I'm trying the following script the "bash -c" trick doesn't work: #!/bin/bash #SBATCH --partition=eli #SBATCH --time=24:00:00 #SBATCH --nodelist=mihaigpu2,mihai-x8640 #SBATCH --gpus=12 #SBATCH --exclusive #SBATCH --job-name="test_job" #SBATCH -o /data/mihai/stdout_%j #SBATCH -e /data/mihai/stderr_%j touch test.txt # Print the hostname of the allocated node srun bash -c 'echo Running on host: $(hostname)' # Print the start time echo "Job started at: $(date)" # Perform a simple task that takes a few minutes echo "Starting the task..." sleep 20 srun echo "GPU UUIDs:" srun nvidia-smi --query-gpu=uuid --format=csv,noheader srun bash -c 'echo $CUDA_VISIBLE_DEVICES' ##echo "Task completed." # Print the end time echo "Job finished at: $(date)" I don't get any output of the command srun bash -c 'echo $CUDA_VISIBLE_DEVICES': Running on host: mihaigpu2 Running on host: mihai-x8640 Job started at: Tue May 28 13:02:59 EEST 2024 Starting the task... GPU UUIDs: GPU UUIDs: GPU-d4e002a9-409f-79bb-70e1-56c1a473a188 GPU-33b728e2-0396-368b-b9c3-8f828ca145b1 GPU-7d90f7d8-aadf-ba95-2409-8c57bd40d24b GPU-30faa03a-0782-4b6c-dda2-e108159ba953 GPU-37d09257-2582-8080-223a-dd5a646fba43 GPU-c71cbb10-4368-d327-e0e5-56372aa4f10f GPU-a413a75a-15b2-063e-638f-bde063af5c8e GPU-bf12181a-e615-dcd4-5da2-9a518ae1af5d GPU-dfec21c4-e30d-5a36-599d-eef2fd354809 GPU-15a11fe2-33f2-cd65-09f0-9897ba057a0c GPU-2d971e69-8147-8221-a055-e26573950f91 GPU-22ee3c89-fed1-891f-96bb-6bbf27a2cc4b Job finished at: Tue May 28 13:03:20 EEST 2024 ...I'm not interesting on the output of the other 'echo' commands, beside the one with the hostname, that's why I didn't changed. Best, Mihai I will try On 2024-05-28 12:23, Hermann Schwärzler via slurm-users wrote: Hi Mihai, this is a problem that is not Slurm related. It's rather about: "when does command substitution happen?" When you write srun echo Running on host: $(hostname) $(hostname) is replaced by the output of the hostname-command *before* the line is "submitted" to srun. Which means that srun will happily run it on any (remote) node using the name of the host it is running on. If you want to avoid this, one possible solution is srun bash -c 'echo Running on host: $(hostname)' In this case the command substitution is happening after srun starts the process on a (potentially remote) node. Regards, Hermann On 5/28/24 10:54, Mihai Ciubancan via slurm-users wrote: Hello, My name is Mihai and a have an issue with a small GPU cluster manage with slurm 22.05.11. I got 2 different output when I'm trying to find out the name of the nodes(one correct and one wrong). The script is: #!/bin/bash #SBATCH --job-name=test #SBATCH --output=/data/mihai/res.txt #SBATCH --partition=eli #SBATCH --nodes=2 srun echo Running on host: $(hostname) srun hostname srun sleep 15 And the output look like this: cat res.txt Running on host: mihai-x8640 Running on host: mihai-x8640 mihaigpu2 mihai-x8640 As you can see the output of the command 'srun echo Running on host: $(hostname)' is the same, as the jobs was running twice on the same node, while command 'srun hostname' it's giving me the correct output. Do you have any idea why the outputs of the 2 commands are different? Thank you, Mihai -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: sbatch problem
Dear Hermann, Sorry to come back to you, but just to understand...if I run the following script: #!/bin/bash #SBATCH --partition=gpu #SBATCH --time=24:00:00 #SBATCH --nodes=2 #SBATCH --exclusive #SBATCH --job-name="test_job" #SBATCH -o stdout_%j #SBATCH -e stderr_%j touch test.txt # Print the hostname of the allocated node echo "Running on host: $(hostname)" # Print the start time echo "Job started at: $(date)" # Perform a simple task that takes a few minutes echo "Starting the task..." sleep 60 echo "GPU UUIDs:" nvidia-smi --query-gpu=uuid --format=csv,noheader echo $CUDA_VISIBLE_DEVICES echo "Task completed." # Print the end time echo "Job finished at: $(date)" I'm getting the following results: Starting the task... GPU UUIDs: GPU UUIDs: GPU-d4e002a9-409f-79bb-70e1-56c1a473a188 GPU-33b728e2-0396-368b-b9c3-8f828ca145b1 GPU-7d90f7d8-aadf-ba95-2409-8c57bd40d24b GPU-30faa03a-0782-4b6c-dda2-e108159ba953 GPU-37d09257-2582-8080-223a-dd5a646fba43 GPU-c71cbb10-4368-d327-e0e5-56372aa4f10f GPU-a413a75a-15b2-063e-638f-bde063af5c8e GPU-bf12181a-e615-dcd4-5da2-9a518ae1af5d GPU-dfec21c4-e30d-5a36-599d-eef2fd354809 GPU-15a11fe2-33f2-cd65-09f0-9897ba057a0c GPU-2d971e69-8147-8221-a055-e26573950f91 GPU-22ee3c89-fed1-891f-96bb-6bbf27a2cc4b 0,1,2,3 0,1,2,3 Task completed. When for the command echo $CUDA_VISIBLE_DEVICES I should get: 0,1,2,3 0,1,2,3,4,5,6,7 This for the some reason that I had problems with hostname? Thank you, Mihai On 2024-05-28 13:31, Hermann Schwärzler wrote: Dear Mihai, you are not asking Slurm to provide you with any GPUs: #SBATCH --gpus=12 So it doesn't reserve any for you and as a consequence also does not set CUDA_VISIBLE_DEVICES for you. nvidia-smi works, because it looks like you are not using cgroups at all or at least not "ConstrainDevices=yes" in e.g. cgroup.conf. So it "sees" all the GPUs that are installed in the node it's running on even if none is reserved for you by Slurm. Regards, Hermann On 5/28/24 12:07, Mihai Ciubancan wrote: Dear Hermann, Dear James, Thank you both for your answers! I have tried as you suggested using bash -c and it worked. But when I'm trying the following script the "bash -c" trick doesn't work: #!/bin/bash #SBATCH --partition=eli #SBATCH --time=24:00:00 #SBATCH --nodelist=mihaigpu2,mihai-x8640 #SBATCH --gpus=12 #SBATCH --exclusive #SBATCH --job-name="test_job" #SBATCH -o /data/mihai/stdout_%j #SBATCH -e /data/mihai/stderr_%j touch test.txt # Print the hostname of the allocated node srun bash -c 'echo Running on host: $(hostname)' # Print the start time echo "Job started at: $(date)" # Perform a simple task that takes a few minutes echo "Starting the task..." sleep 20 srun echo "GPU UUIDs:" srun nvidia-smi --query-gpu=uuid --format=csv,noheader srun bash -c 'echo $CUDA_VISIBLE_DEVICES' ##echo "Task completed." # Print the end time echo "Job finished at: $(date)" I don't get any output of the command srun bash -c 'echo $CUDA_VISIBLE_DEVICES': Running on host: mihaigpu2 Running on host: mihai-x8640 Job started at: Tue May 28 13:02:59 EEST 2024 Starting the task... GPU UUIDs: GPU UUIDs: GPU-d4e002a9-409f-79bb-70e1-56c1a473a188 GPU-33b728e2-0396-368b-b9c3-8f828ca145b1 GPU-7d90f7d8-aadf-ba95-2409-8c57bd40d24b GPU-30faa03a-0782-4b6c-dda2-e108159ba953 GPU-37d09257-2582-8080-223a-dd5a646fba43 GPU-c71cbb10-4368-d327-e0e5-56372aa4f10f GPU-a413a75a-15b2-063e-638f-bde063af5c8e GPU-bf12181a-e615-dcd4-5da2-9a518ae1af5d GPU-dfec21c4-e30d-5a36-599d-eef2fd354809 GPU-15a11fe2-33f2-cd65-09f0-9897ba057a0c GPU-2d971e69-8147-8221-a055-e26573950f91 GPU-22ee3c89-fed1-891f-96bb-6bbf27a2cc4b Job finished at: Tue May 28 13:03:20 EEST 2024 ...I'm not interesting on the output of the other 'echo' commands, beside the one with the hostname, that's why I didn't changed. Best, Mihai I will try On 2024-05-28 12:23, Hermann Schwärzler via slurm-users wrote: Hi Mihai, this is a problem that is not Slurm related. It's rather about: "when does command substitution happen?" When you write srun echo Running on host: $(hostname) $(hostname) is replaced by the output of the hostname-command *before* the line is "submitted" to srun. Which means that srun will happily run it on any (remote) node using the name of the host it is running on. If you want to avoid this, one possible solution is srun bash -c 'echo Running on host: $(hostname)' In this case the command substitution is happening after srun starts the process on a (potentially remote) node. Regards, Hermann On 5/28/24 10:54, Mihai Ciubancan via slurm-users wrote: Hello, My name is Mihai and a have an issue with a small GPU cluster manage with slurm 22.05.11. I got 2 different output when I&
[slurm-users] jobs dropping
Hello, We are trying to run some PiconGPU codes on a machine with 8x100H, susing slurm. But the jobs don't run, and are not in the queue. In slurmd logs I have: [2024-10-24T09:50:40.934] CPU_BIND: _set_batch_job_limits: Memory extracted from credential for StepId=1079.batch job_mem_limit= 648000 [2024-10-24T09:50:40.934] Launching batch job 1079 for UID 1009 [2024-10-24T09:50:40.938] debug: acct_gather_energy/none: init: AcctGatherEnergy NONE plugin loaded [2024-10-24T09:50:40.938] debug: acct_gather_profile/none: init: AcctGatherProfile NONE plugin loaded [2024-10-24T09:50:40.938] debug: acct_gather_interconnect/none: init: AcctGatherInterconnect NONE plugin loaded [2024-10-24T09:50:40.938] debug: acct_gather_filesystem/none: init: AcctGatherFilesystem NONE plugin loaded [2024-10-24T09:50:40.939] debug: gres/gpu: init: loaded [2024-10-24T09:50:41.022] [1079.batch] debug: cgroup/v2: init: Cgroup v2 plugin loaded [2024-10-24T09:50:41.026] [1079.batch] debug: CPUs:192 Boards:1 Sockets:2 CoresPerSocket:48 ThreadsPerCore:2 [2024-10-24T09:50:41.026] [1079.batch] debug: jobacct_gather/cgroup: init: Job accounting gather cgroup plugin loaded [2024-10-24T09:50:41.026] [1079.batch] CPU_BIND: Memory extracted from credential for StepId=1079.batch job_mem_limit=648000 step_mem_limit=648000 [2024-10-24T09:50:41.027] [1079.batch] debug: laying out the 8 tasks on 1 hosts mihaigpu2 dist 2 [2024-10-24T09:50:41.027] [1079.batch] gres_job_state gres:gpu(7696487) type:(null)(0) job:1079 flags: [2024-10-24T09:50:41.027] [1079.batch] total_gres:8 [2024-10-24T09:50:41.027] [1079.batch] node_cnt:1 [2024-10-24T09:50:41.027] [1079.batch] gres_cnt_node_alloc[0]:8 [2024-10-24T09:50:41.027] [1079.batch] gres_bit_alloc[0]:0-7 of 8 [2024-10-24T09:50:41.027] [1079.batch] debug: Message thread started pid = 459054 [2024-10-24T09:50:41.027] [1079.batch] debug: Setting slurmstepd(459054) oom_score_adj to -1000 [2024-10-24T09:50:41.027] [1079.batch] debug: switch/none: init: switch NONE plugin loaded [2024-10-24T09:50:41.027] [1079.batch] debug: task/cgroup: init: core enforcement enabled [2024-10-24T09:50:41.027] [1079.batch] debug: task/cgroup: task_cgroup_memory_init: task/cgroup/memory: total:2063720M allowed:100%(enforced), swap:0%(permissive), max:100%(2063720M) max+swap:100%(4127440M) min:30M kmem:100%(2063720M permissive) min:30M [2024-10-24T09:50:41.027] [1079.batch] debug: task/cgroup: init: memory enforcement enabled [2024-10-24T09:50:41.027] [1079.batch] debug: task/cgroup: init: Tasks containment cgroup plugin loaded [2024-10-24T09:50:41.027] [1079.batch] cred/munge: init: Munge credential signature plugin loaded [2024-10-24T09:50:41.027] [1079.batch] debug: job_container/none: init: job_container none plugin loaded [2024-10-24T09:50:41.030] [1079.batch] debug: spank: opening plugin stack /etc/slurm/plugstack.conf [2024-10-24T09:50:41.030] [1079.batch] debug: task/cgroup: task_cgroup_cpuset_create: job abstract cores are '0-63' [2024-10-24T09:50:41.030] [1079.batch] debug: task/cgroup: task_cgroup_cpuset_create: step abstract cores are '0-63' [2024-10-24T09:50:41.030] [1079.batch] debug: task/cgroup: task_cgroup_cpuset_create: job physical CPUs are '0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126,128,130,132,134,136,138,140,142,144,146,148,150,152,154,156,158' [2024-10-24T09:50:41.030] [1079.batch] debug: task/cgroup: task_cgroup_cpuset_create: step physical CPUs are '0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126,128,130,132,134,136,138,140,142,144,146,148,150,152,154,156,158' [2024-10-24T09:50:41.031] [1079.batch] task/cgroup: _memcg_initialize: job: alloc=648000MB mem.limit=648000MB memsw.limit=unlimited [2024-10-24T09:50:41.031] [1079.batch] task/cgroup: _memcg_initialize: step: alloc=648000MB mem.limit=648000MB memsw.limit=unlimited [2024-10-24T09:50:41.064] [1079.batch] debug levels are stderr='error', logfile='debug', syslog='quiet' [2024-10-24T09:50:41.064] [1079.batch] starting 1 tasks [2024-10-24T09:50:41.064] [1079.batch] task 0 (459058) started 2024-10-24T09:50:41 [2024-10-24T09:50:41.069] [1079.batch] _set_limit: RLIMIT_NOFILE : reducing req:1048576 to max:131072 [2024-10-24T09:51:23.066] debug: _rpc_terminate_job: uid = 64030 JobId=1079 [2024-10-24T09:51:23.067] debug: credential for job 1079 revoked [2024-10-24T09:51:23.067] [1079.batch] debug: Handling REQUEST_SIGNAL_CONTAINER [2024-10-24T09:51:23.067] [1079.batch] debug: _handle_signal_container for StepId=1079.batch uid=64030 signal=18 [2024-10-24T09:51:23.068] [1079.batch] Sent signal 18 to StepId=1079.batch [2024-10-24T09:51:23.068] [1079.batch] debug: Handling REQUEST_SIGNAL_CONTAINER [2024-10-24T09:51:23.068] [1079.batch] debug: _handle_signal_container for StepI