date:20221205

[slurm-users] mpich and sbatch

2022-12-05 Thread stephen tjemkes


HI,


Maybe this has been asked several times, and a solution might be readily 
available.


Facility topology: 7 identical machines, one host 6 clients, 16 Gram, 8 
core each. Which are virtual machines. the bare metal machine 
configuration is unfortunately not known to me.


the slurm.conf listed below

use case: i have a 30 different scripts, each script activates an 
application in a separate partition of a shared disk


All input files are copied from a repository in the specific partition

and the application is activated by mpirun -n 8 executable

if i submit a single instance of the ksh script to the slurm batch 
(sbatch ksh-script) the system is happy to use all 8 cores at 100 % for 
the user


however, if i submit more than say 4  instances, these jobs will be 
submitted to the various nodes, but nmon or htop indicates that each of 
the 8 core is 100% used, however the partition is roughly 25 % and 75 % 
steal


Question is if this the result of a slurm setting (if yes which setting 
should i add to the conf file)


or if this is an issue with the setting - configuration of the 
bare-metal machine


many thanks

stephen



# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ClusterName=cluster
SlurmctldHost=***
MpiDefault=none
ProctrackType=proctrack/linuxproc
ReturnToService=2
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurm-wlm/slurmd
SlurmUser=slurm
StateSaveLocation=/var/spool/slurm-wlm/slurmctld
SwitchType=switch/none
TaskPlugin=task/none
#
# TIMERS
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core
#
#AccountingStoragePort=
AccountingStorageType=accounting_storage/none
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm-wlm/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurm-wlm/slurmd.log
#
# COMPUTE NODES
NodeName=*** NodeAddr=** CPUs=1 RealMemory=16000 State=UNKNOWN
PartitionName=w4repp Nodes=ALL Default=YES MaxTime=INFINITE State=UP

#PartitionName=LocalQ Nodes=ALL Default=YES MaxTime=INFINITE State=UP

[slurm-users] SC'22 Presentations Online; SLUG'23 will be at BYU Sept. 2023

2022-12-05 Thread Tim Wickberg


Two quick announcements I wanted to share:

Presentations from SC'22 in Dallas are in the publication archive now:
https://slurm.schedmd.com/publications.html

The Slurm User Group Meeting ("SLUG'23") will be held in person in 
Provo, Utah, at Brigham Young University in September 2023. We're still 
working to finalize the exact dates, and will have a call for 
presentations out in the spring.


- Tim

--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support

[slurm-users] mpich and sbatch

[slurm-users] SC'22 Presentations Online; SLUG'23 will be at BYU Sept. 2023

2 matches

Site Navigation

Mail list logo

Footer information