[slurm-users] Whether slurm can support GPU MIG feature

2020-11-10 Thread Chaofeng Zhang
Whether slurm can be configured to use the multiple gpu instance of the A100.  
I can’t add multiple same devices into gres.conf, this is supported in slurm 
18, but not supported in slurm 20.
cat gres.conf
Name=gpu File=/dev/nvidia0
Name=gpu File=/dev/nvidia0
Name=gpu File=/dev/nvidia0
Name=gpu File=/dev/nvidia0

[cid:image001.jpg@01D6B784.E20E1C70]

Thanks

Jeff (ChaoFeng Zhang, 张超锋) PMP®   
zhang...@lenovo.com
HPC&AI | Cloud Software Architect  (+86) - 18116117420
Software solution development (+8621) - 20590223
Shanghai, China




[slurm-users] failed to send msg type 6002: No route to host

2020-11-10 Thread Patrick Bégou
Hi,

I'm new to slurm (as admin) and I need some help. Testing my initial
setup with:

[begou@tenibre ~]$ *salloc -n 1 sh*
salloc: Granted job allocation 11
sh-4.4$ *squeue*
 JOBID PARTITION NAME USER ST   TIME  NODES
NODELIST(REASON)
    *11 *  all   sh    begou  R   0:16 
1 tenibre-0-0
sh-4.4$*srun /usr/bin/hostname*
srun: error: timeout waiting for task launch, started 0 of 1 tasks
srun: Job step 11.0 aborted before step completely launched.
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: Timed out waiting for job step to complete

I check the connections:

*tenibre is the login node* (no daemon running)

nc -v tenibre-0-0 6818
nc -v management1 6817

*management1 is the management node* (slurmctld running)

nc -v tenibre-0-0 6818

*tenibre-0-0 is the first compute node* (slurmd running)

nc -v management1 6817

All tests return "/Ncat: Connected.../"

The command "id begou" works on all nodes and I can reach my home
directory on the login node and on the compute node.

On the compute node slurmd.log shows:

[2020-11-10T11:21:38.050]*launch task* *11.0 *request from UID:23455
GID:1036 HOST:172.30.1.254 PORT:42220
[2020-11-10T11:21:38.050] debug:  Checking credential with 508 bytes
of sig data
[2020-11-10T11:21:38.050] _run_prolog: run job script took usec=12
[2020-11-10T11:21:38.050] _run_prolog: prolog with lock for job 11
ran for 0 seconds
[2020-11-10T11:21:38.053] debug:  AcctGatherEnergy NONE plugin loaded
[2020-11-10T11:21:38.053] debug:  AcctGatherProfile NONE plugin loaded
[2020-11-10T11:21:38.053] debug:  AcctGatherInterconnect NONE plugin
loaded
[2020-11-10T11:21:38.053] debug:  AcctGatherFilesystem NONE plugin
loaded
[2020-11-10T11:21:38.053] debug:  switch NONE plugin loaded
[2020-11-10T11:21:38.054] [11.0] debug:  Job accounting gather
NOT_INVOKED plugin loaded
[2020-11-10T11:21:38.054] [11.0] debug:  Message thread started pid
= 12099
[2020-11-10T11:21:38.054] debug:  task_p_slurmd_reserve_resources: 11 0
[2020-11-10T11:21:38.068] [11.0] debug:  task NONE plugin loaded
[2020-11-10T11:21:38.068] [11.0] debug:  Checkpoint plugin loaded:
checkpoint/none
[2020-11-10T11:21:38.068] [11.0] Munge credential signature plugin
loaded
[2020-11-10T11:21:38.068] [11.0] debug:  job_container none plugin
loaded
[2020-11-10T11:21:38.068] [11.0] debug:  mpi type = pmi2
[2020-11-10T11:21:38.068] [11.0] debug:  xcgroup_instantiate: cgroup
'/sys/fs/cgroup/freezer/slurm' already exists
[2020-11-10T11:21:38.068] [11.0] debug:  spank: opening plugin stack
/etc/slurm/plugstack.conf
[2020-11-10T11:21:38.068] [11.0] debug:  mpi type = (null)
[2020-11-10T11:21:38.068] [11.0] debug:  using mpi/pmi2
[2020-11-10T11:21:38.068] [11.0] debug:  _setup_stepd_job_info:
SLURM_STEP_RESV_PORTS not found in env
[2020-11-10T11:21:38.068] [11.0] debug:  mpi/pmi2: setup sockets
[2020-11-10T11:21:38.069] [11.0] debug:  mpi/pmi2: started agent thread
[2020-11-10T11:21:38.069] [11.0]*error: connect io: No route to host*
[2020-11-10T11:21:38.069] [11.0] error: IO setup failed: No route to
host
[2020-11-10T11:21:38.069] [11.0] debug:  step_terminate_monitor_stop
signaling condition
[2020-11-10T11:21:38.069] [11.0] error: job_manager exiting
abnormally, rc = 4021
[2020-11-10T11:21:38.069] [11.0] debug:  Sending launch resp rc=4021
[2020-11-10T11:21:38.069] [11.0] debug:  _send_srun_resp_msg: 0/5
*failed to send msg type 6002: No route to host*
[2020-11-10T11:21:38.169] [11.0] debug:  _send_srun_resp_msg: 1/5
failed to send msg type 6002: No route to host
[2020-11-10T11:21:38.370] [11.0] debug:  _send_srun_resp_msg: 2/5
failed to send msg type 6002: No route to host
[2020-11-10T11:21:38.770] [11.0] debug:  _send_srun_resp_msg: 3/5
failed to send msg type 6002: No route to host
[2020-11-10T11:21:39.570] [11.0] debug:  _send_srun_resp_msg: 4/5
failed to send msg type 6002: No route to host
[2020-11-10T11:21:40.370] [11.0] debug:  _send_srun_resp_msg: 5/5
failed to send msg type 6002: No route to host
[2020-11-10T11:21:40.372] [11.0] debug:  Message thread exited
[2020-11-10T11:21:40.372] [11.0] debug:  mpi/pmi2: agent thread exit
[2020-11-10T11:21:40.372] [11.0] *done with job*


But I do not understand what this "No route to host" means.


Thanks for your help.

Patrick




Re: [slurm-users] Tracking maximum memory via cgroup

2020-11-10 Thread Patrik Andersson
Looking into this more it looks like memory.max_usage_in_byte and
memory.usage_in_bytes also count file cache. Which is very surprising and
not at all useful. But total_rss in memory.stat shows a more correct
number. Looking at that one for a real job gives me around 30 GB, which
matches my other data and expectations. But my issue remains that sstat and
sacct report only a few MB for memory stats.

On Mon, 9 Nov 2020 at 18:49, Patrik Andersson 
wrote:

> We are using cgroups to track resource usage of our jobs. The jobs are run
> in docker with docker's --parent-cgroup flag pointing at the slurm job's
> cgroup. This works great for limiting memory usage.
>
> Unfortunately the maximum memory usage, maxRSS, is not accurately reported
> in sacct. While the cgroup's memory.max_usage_in_bytes does show accurate
> numbers.
>
> Looking at the cgroup:
>
>> /sys/fs/cgroup/memory/slurm/uid_500/job_31626/memory.max_usage_in_bytes:1132154880
>> # 1GB
>> /sys/fs/cgroup/memory/slurm/uid_500/job_31626/memory.use_hierarchy:1
>> /sys/fs/cgroup/memory/slurm/uid_500/job_31626/memory.stat:rss 0
>> /sys/fs/cgroup/memory/slurm/uid_500/job_31626/memory.stat:total_rss 524288
>>
>
> Looking at sacct:
>
>> $ sacct -j 31626 -o
>> jobid,AveRSS,MaxRSS,AveVMSize,MaxVMSize,ReqMem,TotalCPU
>>
>>JobID AveRSS MaxRSS  MaxVMSize
>> 31626.batch  28600K 28600K 77900K
>
>
> I expected that we would get some of the cgroup stats since we are using
> cgroup plugins.
>
> lines from slurm.conf
>
>> JobAcctGatherFrequency=30
>>
>> JobAcctGatherType=jobacct_gather/cgroup
>>
>> ProctrackType=proctrack/cgroup
>>
>> TaskPlugin=task/affinity,task/cgroup
>>
>> SelectTypeParameters=CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK
>>
>
>  cgroup.conf
>
>> CgroupAutomount=yes
>>
>> CgroupMountpoint=/sys/fs/cgroup
>>
>>
>>
>> ### Task/cgroup Plugin ###
>>
>> # Constrain allowed cores to the subset of allocated resources.
>>
>> # This functionality makes use of the cpuset subsystem
>>
>> ConstrainCores=yes
>>
>> ConstrainKmemSpace=yes
>>
>> ConstrainRAMSpace=yes
>>
>> ConstrainSwapSpace=yes
>>
>> ConstrainDevices=no
>>
>> MinKmemSpace=30
>>
>> MinRAMSpace=30
>>
>> # Set a default task affinity to bind each step task to a subset of the
>>
>> # allocated cores using sched_setaffinity
>>
>> # /!\ This feature requires the Portable Hardware Locality (hwloc) library
>>
>> TaskAffinity=no
>>
>


Re: [slurm-users] failed to send msg type 6002: No route to host

2020-11-10 Thread Brian Andrus

This looks like it may be trying to do something using mpi.

What does your slurm.conf look like for that node?

Brian Andrus

On 11/10/2020 2:54 AM, Patrick Bégou wrote:


Hi,

I'm new to slurm (as admin) and I need some help. Testing my initial 
setup with:


[begou@tenibre ~]$ *salloc -n 1 sh*
salloc: Granted job allocation 11
sh-4.4$ *squeue*
 JOBID PARTITION NAME USER ST   TIME NODES
NODELIST(REASON)
*11 *  all   sh    begou  R 0:16  1 tenibre-0-0
sh-4.4$*srun /usr/bin/hostname*
srun: error: timeout waiting for task launch, started 0 of 1 tasks
srun: Job step 11.0 aborted before step completely launched.
srun: Job step aborted: Waiting up to 32 seconds for job step to
finish.
srun: error: Timed out waiting for job step to complete

I check the connections:

*tenibre is the login node* (no daemon running)

nc -v tenibre-0-0 6818
nc -v management1 6817

*management1 is the management node* (slurmctld running)

nc -v tenibre-0-0 6818

*tenibre-0-0 is the first compute node* (slurmd running)

nc -v management1 6817

All tests return "/Ncat: Connected.../"

The command "id begou" works on all nodes and I can reach my home 
directory on the login node and on the compute node.


On the compute node slurmd.log shows:

[2020-11-10T11:21:38.050]*launch task* *11.0 *request from
UID:23455 GID:1036 HOST:172.30.1.254 PORT:42220
[2020-11-10T11:21:38.050] debug:  Checking credential with 508
bytes of sig data
[2020-11-10T11:21:38.050] _run_prolog: run job script took usec=12
[2020-11-10T11:21:38.050] _run_prolog: prolog with lock for job 11
ran for 0 seconds
[2020-11-10T11:21:38.053] debug:  AcctGatherEnergy NONE plugin loaded
[2020-11-10T11:21:38.053] debug:  AcctGatherProfile NONE plugin loaded
[2020-11-10T11:21:38.053] debug:  AcctGatherInterconnect NONE
plugin loaded
[2020-11-10T11:21:38.053] debug:  AcctGatherFilesystem NONE plugin
loaded
[2020-11-10T11:21:38.053] debug:  switch NONE plugin loaded
[2020-11-10T11:21:38.054] [11.0] debug:  Job accounting gather
NOT_INVOKED plugin loaded
[2020-11-10T11:21:38.054] [11.0] debug:  Message thread started
pid = 12099
[2020-11-10T11:21:38.054] debug: task_p_slurmd_reserve_resources: 11 0
[2020-11-10T11:21:38.068] [11.0] debug:  task NONE plugin loaded
[2020-11-10T11:21:38.068] [11.0] debug:  Checkpoint plugin loaded:
checkpoint/none
[2020-11-10T11:21:38.068] [11.0] Munge credential signature plugin
loaded
[2020-11-10T11:21:38.068] [11.0] debug:  job_container none plugin
loaded
[2020-11-10T11:21:38.068] [11.0] debug:  mpi type = pmi2
[2020-11-10T11:21:38.068] [11.0] debug:  xcgroup_instantiate:
cgroup '/sys/fs/cgroup/freezer/slurm' already exists
[2020-11-10T11:21:38.068] [11.0] debug:  spank: opening plugin
stack /etc/slurm/plugstack.conf
[2020-11-10T11:21:38.068] [11.0] debug:  mpi type = (null)
[2020-11-10T11:21:38.068] [11.0] debug:  using mpi/pmi2
[2020-11-10T11:21:38.068] [11.0] debug:  _setup_stepd_job_info:
SLURM_STEP_RESV_PORTS not found in env
[2020-11-10T11:21:38.068] [11.0] debug:  mpi/pmi2: setup sockets
[2020-11-10T11:21:38.069] [11.0] debug:  mpi/pmi2: started agent
thread
[2020-11-10T11:21:38.069] [11.0]*error: connect io: No route to host*
[2020-11-10T11:21:38.069] [11.0] error: IO setup failed: No route
to host
[2020-11-10T11:21:38.069] [11.0] debug:
step_terminate_monitor_stop signaling condition
[2020-11-10T11:21:38.069] [11.0] error: job_manager exiting
abnormally, rc = 4021
[2020-11-10T11:21:38.069] [11.0] debug:  Sending launch resp rc=4021
[2020-11-10T11:21:38.069] [11.0] debug:  _send_srun_resp_msg: 0/5
*failed to send msg type 6002: No route to host*
[2020-11-10T11:21:38.169] [11.0] debug:  _send_srun_resp_msg: 1/5
failed to send msg type 6002: No route to host
[2020-11-10T11:21:38.370] [11.0] debug:  _send_srun_resp_msg: 2/5
failed to send msg type 6002: No route to host
[2020-11-10T11:21:38.770] [11.0] debug:  _send_srun_resp_msg: 3/5
failed to send msg type 6002: No route to host
[2020-11-10T11:21:39.570] [11.0] debug:  _send_srun_resp_msg: 4/5
failed to send msg type 6002: No route to host
[2020-11-10T11:21:40.370] [11.0] debug:  _send_srun_resp_msg: 5/5
failed to send msg type 6002: No route to host
[2020-11-10T11:21:40.372] [11.0] debug:  Message thread exited
[2020-11-10T11:21:40.372] [11.0] debug:  mpi/pmi2: agent thread exit
[2020-11-10T11:21:40.372] [11.0] *done with job*


But I do not understand what this "No route to host" means.


Thanks for your help.

Patrick