[slurm-users] Whether slurm can support GPU MIG feature

2020-11-10 Thread Chaofeng Zhang
Whether slurm can be configured to use the multiple gpu instance of the A100. I can’t add multiple same devices into gres.conf, this is supported in slurm 18, but not supported in slurm 20. cat gres.conf Name=gpu File=/dev/nvidia0 Name=gpu File=/dev/nvidia0 Name=gpu File=/dev/nvidia0 Name=gpu Fi

[slurm-users] failed to send msg type 6002: No route to host

2020-11-10 Thread Patrick Bégou
Hi, I'm new to slurm (as admin) and I need some help. Testing my initial setup with: [begou@tenibre ~]$ *salloc -n 1 sh* salloc: Granted job allocation 11 sh-4.4$ *squeue* JOBID PARTITION NAME USER ST   TIME  NODES NODELIST(REASON)    

Re: [slurm-users] Tracking maximum memory via cgroup

2020-11-10 Thread Patrik Andersson
Looking into this more it looks like memory.max_usage_in_byte and memory.usage_in_bytes also count file cache. Which is very surprising and not at all useful. But total_rss in memory.stat shows a more correct number. Looking at that one for a real job gives me around 30 GB, which matches my other d

Re: [slurm-users] failed to send msg type 6002: No route to host

2020-11-10 Thread Brian Andrus
This looks like it may be trying to do something using mpi. What does your slurm.conf look like for that node? Brian Andrus On 11/10/2020 2:54 AM, Patrick Bégou wrote: Hi, I'm new to slurm (as admin) and I need some help. Testing my initial setup with: [begou@tenibre ~]$ *salloc -n 1