[slurm-users] How to get the CPU usage of history jobs at each compute node?
Dear there, How to view the cpu usage of history jobs at each compute node? However, this command(control show jobs jobid --detail) can only get the cpu usage of the currently running job at each compute node : Appreciatively, Menglong
Re: [slurm-users] How to get the CPU usage of history jobs at each compute node?
Thanks for Merlin Hartley and Eli V's replay ! This command(sacct -j ) can only get the total number of cpus for a history job: However , the command(scontrol show jobs --detail)can get the number of cpus of a running job on each node: And I expect to get the cpu number of a certain history job on EACH compute node. It's like a combination of the two above. 祝工作顺利! 姓名 胡梦龙 手机 135-6164-9610 部门 HPC 中科曙光国际信息产业有限公司 青岛市崂山区株洲路78号中科曙光大厦(3号楼) 266000 From: Merlin Hartley Date: 2019-02-15 23:36 To: Slurm User Community List CC: 张涛 Subject: Re: [slurm-users] How to get the CPU usage of history jobs at each compute node? using sacct [1] - assuming you have accounting [2] enabled: sacct -j Hope this helps! Merlin [1] https://slurm.schedmd.com/sacct.html [2] https://slurm.schedmd.com/accounting.html -- Merlin Hartley Computer Officer MRC Mitochondrial Biology Unit University of Cambridge Cambridge, CB2 0XY United Kingdom On 15 Feb 2019, at 10:05, hu...@sugon.com wrote: Dear there, How to view the cpu usage of history jobs at each compute node? However, this command(control show jobs jobid --detail) can only get the cpu usage of the currently running job at each compute node : Appreciatively, Menglong -- Merlin Hartley Computer Officer MRC Mitochondrial Biology Unit University of Cambridge Cambridge, CB2 0XY United Kingdom
[slurm-users] 转发: a heterogeneous job terminate unexpectedly
Dear there, I have a cluster with 9 nodes(cmbc[1530-1538]) , each node has 2 cpus and each cpu has 32cores, but when I submitted a heterogeneous job twice ,the second job terminated unexpectedly. This problem has been bothering me all day. Slurm version is 18.08.5 and here is the job : ** #!/bin/bash #SBATCH -J FIRE #SBATCH -o log.heter.%j #SBATCH -e log.heter.%j #SBATCH --comment=WRF #SBATCH --mem=20G #SBATCH -p largemem #SBATCH -n 64 -N 2 #SBATCH packjob #SBATCH -J HAHA1 #SBATCH -p largemem #SBATCH -n 16 -N 1 #SBATCH --mem=20G #SBATCH packjob #SBATCH -J HAHA2 #SBATCH -w cmbc1533 #SBATCH -p largemem #SBATCH -n 8 -N 1 #SBATCH --mem=20G module load compiler/intel/composer_xe_2018.1.163 module load mpi/intelmpi/2018.1 export I_MPI_PMI_LIBRARY=/opt/slurm18/lib/libpmi.so time srun --mpi=pmi2 ./inter_fire 96000 : ./intel_fire 96000 : ./intel_fire 96000 date * Here is the error of the terminated job : Appreciatively, Menglong 祝工作顺利! 姓名 胡梦龙 手机 135-6164-9610 部门 HPC 中科曙光国际信息产业有限公司 青岛市崂山区株洲路78号中科曙光大厦(3号楼) 266000
[slurm-users] OverSubscribe=FORCE:1overloads nodes?
Dear there, I have two jobs in my cluster, which has 32 cores per compute node. The first job uses eight nodes and 256 cores, which means it takes up all eight nodes. The second job uses five nodes and 32 cores, which means only partial cores of five nodes will be used. Slurm, however, allocated some of the same nodes for the two jobs, resulting in overload of these nodes. I wonder if my partition configuration OverSubscribe=FORCE:1 caused this to happen. How to prevent this from happening? Appreciatively, Menglong
[slurm-users] pmix and ucx(IB) testing fails with error "Cannot get polling fd"
Hi, When I was testing slurm-19.05.3 with openmpi-4.0.1 、pmix-3.1.3rc4 and ucx-1.6.1(with IB) ,I got a different error unlike Bug 7646(https://bugs.schedmd.com/show_bug.cgi?id=7646).At first , the job like "srun --mpi=pmix_v3 xxx" could run with "SLURM_PMIX_DIRECT_CONN=true" and "SLURM_PMIX_DIRECT_CONN_UCX=false",but the job was ended immediately when "SLURM_PMIX_DIRECT_CONN_UCX=true" was configured. And I got an error like "Cannot get polling fd" after "Fail to create UCX worker: Input/output error" I've confirmed that the error msg comes from ucp_worker_create. so I tried to get config after ucp_config_read("SLURM", NULL, &config) and ucp_context after ucp_init(&ucp_params, config, &ucp_context). UCX_NET_DEVICES=all UCX_SHM_DEVICES=all UCX_ACC_DEVICES=all UCX_SELF_DEVICES=all UCX_TLS=all UCX_ALLOC_PRIO=md:sysv,md:posix,huge,thp,md:*,mmap,heap UCX_SOCKADDR_AUX_TLS=ud,ud_x UCX_WARN_INVALID_CONFIG=y UCX_BCOPY_THRESH=0 UCX_RNDV_THRESH=auto UCX_RNDV_SEND_NBR_THRESH=256k UCX_RNDV_THRESH_FALLBACK=inf UCX_RNDV_PERF_DIFF=1.000 UCX_MAX_EAGER_RAILS=1 UCX_MAX_RNDV_RAILS=1 UCX_RNDV_SCHEME=auto UCX_ZCOPY_THRESH=auto UCX_BCOPY_BW=5800m UCX_ATOMIC_MODE=guess UCX_MAX_WORKER_NAME=32 UCX_USE_MT_MUTEX=n UCX_ADAPTIVE_PROGRESS=y UCX_SEG_SIZE=8k UCX_TM_THRESH=1k UCX_TM_MAX_BB_SIZE=1k UCX_TM_FORCE_THRESH=8k UCX_NUM_EPS=auto UCX_RNDV_FRAG_SIZE=256k UCX_MEMTYPE_CACHE=y UCX_FLUSH_WORKER_EPS=y UCX_UNIFIED_MODE=n hello 2 UCX_NET_DEVICES=all UCX_SHM_DEVICES=all UCX_ACC_DEVICES=all UCX_SELF_DEVICES=all UCX_TLS=all UCX_ALLOC_PRIO=md:sysv,md:posix,huge,thp,md:*,mmap,heap UCX_SOCKADDR_AUX_TLS=ud,ud_x UCX_WARN_INVALID_CONFIG=y UCX_BCOPY_THRESH=0 UCX_RNDV_THRESH=auto UCX_RNDV_SEND_NBR_THRESH=256k UCX_RNDV_THRESH_FALLBACK=inf UCX_RNDV_PERF_DIFF=1.000 UCX_MAX_EAGER_RAILS=1 UCX_MAX_RNDV_RAILS=1 UCX_RNDV_SCHEME=auto UCX_ZCOPY_THRESH=auto UCX_BCOPY_BW=5800m UCX_ATOMIC_MODE=guess UCX_MAX_WORKER_NAME=32 UCX_USE_MT_MUTEX=n UCX_ADAPTIVE_PROGRESS=y UCX_SEG_SIZE=8k UCX_TM_THRESH=1k UCX_TM_MAX_BB_SIZE=1k UCX_TM_FORCE_THRESH=8k UCX_NUM_EPS=auto UCX_RNDV_FRAG_SIZE=256k UCX_MEMTYPE_CACHE=y UCX_FLUSH_WORKER_EPS=y UCX_UNIFIED_MODE=n # # UCP context # #md 0 : self #md 1 : tcp #md 2 : ib/mlx5_3 #md 3 : ib/mlx5_2 #md 4 : ib/mlx5_1 #md 5 : ib/mlx5_0 #md 6 : rdmacm #md 7 : sysv #md 8 : posix #md 9 : cma #md 10 : knem # # resource 0 : md 0 dev 0 flags -- self/self # resource 1 : md 1 dev 1 flags -- tcp/ib0 # resource 2 : md 1 dev 2 flags -- tcp/eno1 # resource 3 : md 2 dev 3 flags -- rc/mlx5_3:1 # resource 4 : md 2 dev 3 flags -- rc_mlx5/mlx5_3:1 # resource 5 : md 2 dev 3 flags -- dc/mlx5_3:1 # resource 6 : md 2 dev 3 flags -- dc_mlx5/mlx5_3:1 # resource 7 : md 2 dev 3 flags -- ud/mlx5_3:1 # resource 8 : md 2 dev 3 flags -- ud_mlx5/mlx5_3:1 # resource 9 : md 2 dev 3 flags -- cm/mlx5_3:1 # resource 10 : md 3 dev 4 flags -- rc/mlx5_2:1 # resource 11 : md 3 dev 4 flags -- rc_mlx5/mlx5_2:1 # resource 12 : md 3 dev 4 flags -- dc/mlx5_2:1 # resource 13 : md 3 dev 4 flags -- dc_mlx5/mlx5_2:1 # resource 14 : md 3 dev 4 flags -- ud/mlx5_2:1 # resource 15 : md 3 dev 4 flags -- ud_mlx5/mlx5_2:1 # resource 16 : md 3 dev 4 flags -- cm/mlx5_2:1 # resource 17 : md 4 dev 5 flags -- rc/mlx5_1:1 # resource 18 : md 4 dev 5 flags -- rc_mlx5/mlx5_1:1 # resource 19 : md 4 dev 5 flags -- dc/mlx5_1:1 # resource 20 : md 4 dev 5 flags -- dc_mlx5/mlx5_1:1 # resource 21 : md 4 dev 5 flags -- ud/mlx5_1:1 # resource 22 : md 4 dev 5 flags -- ud_mlx5/mlx5_1:1 # resource 23 : md 4 dev 5 flags -- cm/mlx5_1:1 # resource 24 : md 5 dev 6 flags -- rc/mlx5_0:1 # resource 25 : md 5 dev 6 flags -- rc_mlx5/mlx5_0:1 # resource 26 : md 5 dev 6 flags -- dc/mlx5_0:1 # resource 27 : md 5 dev 6 flags -- dc_mlx5/mlx5_0:1 # resource 28 : md 5 dev 6 flags -- ud/mlx5_0:1 # resource 29 : md 5 dev 6 flags -- ud_mlx5/mlx5_0:1 # resource 30 : md 5 dev 6 flags -- cm/mlx5_0:1 # resource 31 : md 6 dev 7 flags -s rdmacm/sockaddr # resource 32 : md 7 dev 8 flags -- mm/sysv # resource 33 : md 8 dev 9 flags -- mm/posix # resource 34 : md 9 dev 10 flags -- cma/cma # resource 35 : md 10 dev 11 flags -- knem/knem # Looking forward to your reply.
[slurm-users] why is the performance fo mpi allreduce job lower than the expected level?
Dear there, We tested mpi allreduce job in three modes (srun-dtcp 、mpirun-slurm、mpirun-ssh), and we found that the job running time in the mpirun-ssh mode is shorter than the other modes. We've set parameters like below: /usr/lib/systemd/system/slurmd.service: LimitMEMLOCK=infinity LimitSTACK=infinity /etc/sysconfig/slurmd: ulimit -l unlimited ulimit -s unlimited We want to know if this is normal ?Will functions such as cgroup and pam_slurm_adopt limit job performance ? And how to improve the efficiency of slurm jobs. Here is the test results: sizesrun-dtcpmpirun-slurmmpirun-ssh 00.050.060.05 42551.83355.67281.02 81469.322419.97139 1667.41151.871000.15 3273.31143.15126.22 64107.14111.6126.3 12877.12473.62344.36 25646.39417.9565.09 51292.84260.6990.5 102497.9137.13112.3 2048286.27233.21169.59 4096155.69343.59160.9 8192261.02465.78151.29 1638412518.0413363.713869.58 3276822071.611398.214842.32 655366041.2.953368.58 13107210436.1118071.5310909.76 26214413802.2224728.788263.53 52428813086.2616394.724825.51 104857628193.0815943.266490.29 209715263277.7324411.5815361.7 419430458538.0560516.1533955.49 (1)srun-dtcp job: #!/bin/bash #SBATCH -J test #SBATCH -N 32 #SBATCH --ntasks-per-node=30 #SBATCH -p seperate NP=$SLURM_NTASKS srun --mpi=pmix_v3 /public/software/benchmark/imb/hpcx/2017/IMB-MPI1 -npmin $NP Allreduce (2)mpirun-slurm job: #!/bin/bash #SBATCH -J test #SBATCH -N 32 #SBATCH --ntasks-per-node=30 #SBATCH -p seperate NP=$SLURM_NTASKS mpirun /public/software/benchmark/imb/hpcx/2017/IMB-MPI1 -npmin $NP Allreduce (3)mpirun-ssh job: #!/bin/bash #SBATCH -J test #SBATCH -N 32 #SBATCH --ntasks-per-node=30 #SBATCH -p seperate env | grep SLURM > env.log scontrol show hostname > nd.$SLURM_JOBID NNODE=$SLURM_NNODES NP=$SLURM_NTASKS mpirun -np $NP -machinefile nd.$SLURM_JOBID -map-by ppr:30:node \ -mca plm rsh -mca plm_rsh_no_tree_spawn 1 -mca plm_rsh_num_concurrent $NNODE \ /public/software/benchmark/imb/hpcx/2017/IMB-MPI1 -npmin $NP Allreduce Best wishes! menglong