[slurm-users] How to get the CPU usage of history jobs at each compute node?

2019-02-15 Thread hu...@sugon.com
Dear there,
How to view the cpu usage of history jobs at each compute node?
However, this command(control show jobs jobid --detail) can only get the cpu 
usage of the currently running job at each compute node :

Appreciatively,
Menglong


Re: [slurm-users] How to get the CPU usage of history jobs at each compute node?

2019-02-15 Thread hu...@sugon.com
Thanks for Merlin Hartley and Eli V's replay !
This command(sacct -j  ) can only get the total number of cpus for a  
history job:
However , the command(scontrol show jobs  --detail)can get the number of 
cpus of a running job on each node:
And I expect to get the cpu number of a certain history job on EACH compute 
node. It's like a combination of the two above.



祝工作顺利!
姓名   胡梦龙
手机   135-6164-9610
部门   HPC


中科曙光国际信息产业有限公司 
青岛市崂山区株洲路78号中科曙光大厦(3号楼) 266000 


 
From: Merlin Hartley
Date: 2019-02-15 23:36
To: Slurm User Community List
CC: 张涛
Subject: Re: [slurm-users] How to get the CPU usage of history jobs at each 
compute node?
using sacct [1] - assuming you have accounting [2] enabled:

sacct -j 

Hope this helps!


Merlin


[1] https://slurm.schedmd.com/sacct.html
[2] https://slurm.schedmd.com/accounting.html


--
Merlin Hartley
Computer Officer
MRC Mitochondrial Biology Unit
University of Cambridge
Cambridge, CB2 0XY
United Kingdom

On 15 Feb 2019, at 10:05, hu...@sugon.com wrote:

Dear there,
How to view the cpu usage of history jobs at each compute node?
However, this command(control show jobs jobid --detail) can only get the cpu 
usage of the currently running job at each compute node :

Appreciatively,
Menglong



--
Merlin Hartley
Computer Officer
MRC Mitochondrial Biology Unit
University of Cambridge
Cambridge, CB2 0XY
United Kingdom



[slurm-users] 转发: a heterogeneous job terminate unexpectedly

2019-02-27 Thread hu...@sugon.com
Dear there,
I have a cluster with 9 nodes(cmbc[1530-1538]) , each node has 2 cpus 
and each cpu has 32cores, 
but when I submitted a heterogeneous job twice ,the second job terminated 
unexpectedly. 
This problem has been bothering me all day. Slurm version is 18.08.5 and here 
is the job :
**
#!/bin/bash
#SBATCH -J FIRE
#SBATCH -o log.heter.%j
#SBATCH -e log.heter.%j
#SBATCH --comment=WRF
#SBATCH --mem=20G
#SBATCH -p largemem
#SBATCH -n 64 -N 2
#SBATCH packjob
#SBATCH -J HAHA1
#SBATCH -p largemem
#SBATCH -n 16 -N 1
#SBATCH --mem=20G
#SBATCH packjob
#SBATCH -J HAHA2
#SBATCH -w cmbc1533
#SBATCH -p largemem
#SBATCH -n 8 -N 1
#SBATCH --mem=20G

module load compiler/intel/composer_xe_2018.1.163
module load mpi/intelmpi/2018.1
export I_MPI_PMI_LIBRARY=/opt/slurm18/lib/libpmi.so
time srun --mpi=pmi2 ./inter_fire 96000 : ./intel_fire 96000 : 
./intel_fire 96000
date
*
Here is the error of the terminated job :
Appreciatively,
Menglong


祝工作顺利!
姓名   胡梦龙
手机   135-6164-9610
部门   HPC


中科曙光国际信息产业有限公司 
青岛市崂山区株洲路78号中科曙光大厦(3号楼) 266000 




[slurm-users] OverSubscribe=FORCE:1overloads nodes?

2019-09-08 Thread hu...@sugon.com
Dear there,
I have two jobs in my cluster, which has 32 cores per compute node. The 
first job uses eight nodes and 256 cores, which means it takes up all eight 
nodes. The second job uses five nodes and 32 cores, which means only partial 
cores of five nodes will be used. Slurm, however, allocated some of the same 
nodes for the two jobs, resulting in overload of these nodes. I wonder if my 
partition configuration OverSubscribe=FORCE:1 caused this to happen. How to 
prevent this from happening?
Appreciatively,
Menglong


[slurm-users] pmix and ucx(IB) testing fails with error "Cannot get polling fd"

2019-11-09 Thread hu...@sugon.com
Hi,
When I was testing slurm-19.05.3 with openmpi-4.0.1 、pmix-3.1.3rc4 and 
ucx-1.6.1(with IB) ,I got a different error unlike Bug 
7646(https://bugs.schedmd.com/show_bug.cgi?id=7646).At first , the job like 
"srun --mpi=pmix_v3 xxx" could run with "SLURM_PMIX_DIRECT_CONN=true" and 
"SLURM_PMIX_DIRECT_CONN_UCX=false",but the job was ended immediately when 
"SLURM_PMIX_DIRECT_CONN_UCX=true" was configured.  And I got an error like 
"Cannot get polling fd" after "Fail to create UCX worker: Input/output error" 
I've confirmed that the error msg comes from ucp_worker_create. so I tried to 
get config  after ucp_config_read("SLURM", NULL, &config) and ucp_context after 
ucp_init(&ucp_params, config, &ucp_context).
UCX_NET_DEVICES=all
UCX_SHM_DEVICES=all
UCX_ACC_DEVICES=all
UCX_SELF_DEVICES=all
UCX_TLS=all
UCX_ALLOC_PRIO=md:sysv,md:posix,huge,thp,md:*,mmap,heap
UCX_SOCKADDR_AUX_TLS=ud,ud_x
UCX_WARN_INVALID_CONFIG=y
UCX_BCOPY_THRESH=0
UCX_RNDV_THRESH=auto
UCX_RNDV_SEND_NBR_THRESH=256k
UCX_RNDV_THRESH_FALLBACK=inf
UCX_RNDV_PERF_DIFF=1.000
UCX_MAX_EAGER_RAILS=1
UCX_MAX_RNDV_RAILS=1
UCX_RNDV_SCHEME=auto
UCX_ZCOPY_THRESH=auto
UCX_BCOPY_BW=5800m
UCX_ATOMIC_MODE=guess
UCX_MAX_WORKER_NAME=32
UCX_USE_MT_MUTEX=n
UCX_ADAPTIVE_PROGRESS=y
UCX_SEG_SIZE=8k
UCX_TM_THRESH=1k
UCX_TM_MAX_BB_SIZE=1k
UCX_TM_FORCE_THRESH=8k
UCX_NUM_EPS=auto
UCX_RNDV_FRAG_SIZE=256k
UCX_MEMTYPE_CACHE=y
UCX_FLUSH_WORKER_EPS=y
UCX_UNIFIED_MODE=n
hello 2
UCX_NET_DEVICES=all
UCX_SHM_DEVICES=all
UCX_ACC_DEVICES=all
UCX_SELF_DEVICES=all
UCX_TLS=all
UCX_ALLOC_PRIO=md:sysv,md:posix,huge,thp,md:*,mmap,heap
UCX_SOCKADDR_AUX_TLS=ud,ud_x
UCX_WARN_INVALID_CONFIG=y
UCX_BCOPY_THRESH=0
UCX_RNDV_THRESH=auto
UCX_RNDV_SEND_NBR_THRESH=256k
UCX_RNDV_THRESH_FALLBACK=inf
UCX_RNDV_PERF_DIFF=1.000
UCX_MAX_EAGER_RAILS=1
UCX_MAX_RNDV_RAILS=1
UCX_RNDV_SCHEME=auto
UCX_ZCOPY_THRESH=auto
UCX_BCOPY_BW=5800m
UCX_ATOMIC_MODE=guess
UCX_MAX_WORKER_NAME=32
UCX_USE_MT_MUTEX=n
UCX_ADAPTIVE_PROGRESS=y
UCX_SEG_SIZE=8k
UCX_TM_THRESH=1k
UCX_TM_MAX_BB_SIZE=1k
UCX_TM_FORCE_THRESH=8k
UCX_NUM_EPS=auto
UCX_RNDV_FRAG_SIZE=256k
UCX_MEMTYPE_CACHE=y
UCX_FLUSH_WORKER_EPS=y
UCX_UNIFIED_MODE=n
#
# UCP context
#
#md 0  :  self
#md 1  :  tcp
#md 2  :  ib/mlx5_3
#md 3  :  ib/mlx5_2
#md 4  :  ib/mlx5_1
#md 5  :  ib/mlx5_0
#md 6  :  rdmacm
#md 7  :  sysv
#md 8  :  posix
#md 9  :  cma
#md 10 :  knem
#
#  resource 0  :  md 0  dev 0  flags -- self/self
#  resource 1  :  md 1  dev 1  flags -- tcp/ib0
#  resource 2  :  md 1  dev 2  flags -- tcp/eno1
#  resource 3  :  md 2  dev 3  flags -- rc/mlx5_3:1
#  resource 4  :  md 2  dev 3  flags -- rc_mlx5/mlx5_3:1
#  resource 5  :  md 2  dev 3  flags -- dc/mlx5_3:1
#  resource 6  :  md 2  dev 3  flags -- dc_mlx5/mlx5_3:1
#  resource 7  :  md 2  dev 3  flags -- ud/mlx5_3:1
#  resource 8  :  md 2  dev 3  flags -- ud_mlx5/mlx5_3:1
#  resource 9  :  md 2  dev 3  flags -- cm/mlx5_3:1
#  resource 10 :  md 3  dev 4  flags -- rc/mlx5_2:1
#  resource 11 :  md 3  dev 4  flags -- rc_mlx5/mlx5_2:1
#  resource 12 :  md 3  dev 4  flags -- dc/mlx5_2:1
#  resource 13 :  md 3  dev 4  flags -- dc_mlx5/mlx5_2:1
#  resource 14 :  md 3  dev 4  flags -- ud/mlx5_2:1
#  resource 15 :  md 3  dev 4  flags -- ud_mlx5/mlx5_2:1
#  resource 16 :  md 3  dev 4  flags -- cm/mlx5_2:1
#  resource 17 :  md 4  dev 5  flags -- rc/mlx5_1:1
#  resource 18 :  md 4  dev 5  flags -- rc_mlx5/mlx5_1:1
#  resource 19 :  md 4  dev 5  flags -- dc/mlx5_1:1
#  resource 20 :  md 4  dev 5  flags -- dc_mlx5/mlx5_1:1
#  resource 21 :  md 4  dev 5  flags -- ud/mlx5_1:1
#  resource 22 :  md 4  dev 5  flags -- ud_mlx5/mlx5_1:1
#  resource 23 :  md 4  dev 5  flags -- cm/mlx5_1:1
#  resource 24 :  md 5  dev 6  flags -- rc/mlx5_0:1
#  resource 25 :  md 5  dev 6  flags -- rc_mlx5/mlx5_0:1
#  resource 26 :  md 5  dev 6  flags -- dc/mlx5_0:1
#  resource 27 :  md 5  dev 6  flags -- dc_mlx5/mlx5_0:1
#  resource 28 :  md 5  dev 6  flags -- ud/mlx5_0:1
#  resource 29 :  md 5  dev 6  flags -- ud_mlx5/mlx5_0:1
#  resource 30 :  md 5  dev 6  flags -- cm/mlx5_0:1
#  resource 31 :  md 6  dev 7  flags -s rdmacm/sockaddr
#  resource 32 :  md 7  dev 8  flags -- mm/sysv
#  resource 33 :  md 8  dev 9  flags -- mm/posix
#  resource 34 :  md 9  dev 10 flags -- cma/cma
#  resource 35 :  md 10 dev 11 flags -- knem/knem
#

Looking forward to your reply.


[slurm-users] why is the performance fo mpi allreduce job lower than the expected level?

2020-12-24 Thread hu...@sugon.com
Dear there,
We tested mpi allreduce job in three modes (srun-dtcp 
、mpirun-slurm、mpirun-ssh), and we found that the job running time in the  
mpirun-ssh mode is shorter than the other modes. 
We've set parameters like below:
/usr/lib/systemd/system/slurmd.service:
LimitMEMLOCK=infinity
LimitSTACK=infinity
/etc/sysconfig/slurmd:
ulimit -l unlimited
ulimit -s unlimited
We want to know if this is normal ?Will functions such as cgroup and 
pam_slurm_adopt limit job performance ? And how to improve the efficiency of 
slurm jobs.
Here is the test results:
sizesrun-dtcpmpirun-slurmmpirun-ssh
00.050.060.05
42551.83355.67281.02
81469.322419.97139
1667.41151.871000.15
3273.31143.15126.22
64107.14111.6126.3
12877.12473.62344.36
25646.39417.9565.09
51292.84260.6990.5
102497.9137.13112.3
2048286.27233.21169.59
4096155.69343.59160.9
8192261.02465.78151.29
1638412518.0413363.713869.58
3276822071.611398.214842.32
655366041.2.953368.58
13107210436.1118071.5310909.76
26214413802.2224728.788263.53
52428813086.2616394.724825.51
104857628193.0815943.266490.29
209715263277.7324411.5815361.7
419430458538.0560516.1533955.49

(1)srun-dtcp job:
#!/bin/bash
#SBATCH -J test
#SBATCH -N 32
#SBATCH --ntasks-per-node=30
#SBATCH -p seperate

NP=$SLURM_NTASKS
srun --mpi=pmix_v3 /public/software/benchmark/imb/hpcx/2017/IMB-MPI1 -npmin $NP 
Allreduce

(2)mpirun-slurm job:
#!/bin/bash
#SBATCH -J test
#SBATCH -N 32
#SBATCH --ntasks-per-node=30
#SBATCH -p seperate

NP=$SLURM_NTASKS
mpirun /public/software/benchmark/imb/hpcx/2017/IMB-MPI1 -npmin $NP Allreduce

(3)mpirun-ssh job:
#!/bin/bash
#SBATCH -J test
#SBATCH -N 32
#SBATCH --ntasks-per-node=30
#SBATCH -p seperate

env | grep SLURM > env.log 
scontrol show hostname > nd.$SLURM_JOBID
NNODE=$SLURM_NNODES
NP=$SLURM_NTASKS

mpirun -np $NP -machinefile nd.$SLURM_JOBID -map-by ppr:30:node \
-mca plm rsh -mca plm_rsh_no_tree_spawn 1 -mca plm_rsh_num_concurrent $NNODE \
/public/software/benchmark/imb/hpcx/2017/IMB-MPI1 -npmin $NP Allreduce



Best wishes!
menglong