Re: [slurm-users] How to share GPU resources? (MPS or another way?)

2019-10-09 Thread Kota Tsuyuzaki
Thanks! The bug report was beneficial. It seems we have to reconsider how to achieve what we want. Cheers, Kota > -Original Message- > From: slurm-users On Behalf Of > Christopher Samuel > Sent: Wednesday, October 9, 2019 11:54 PM > To: slurm-users@lists.schedmd.com > Subject: Re: [slu

[slurm-users] How can I check the delay-boot option of the sbatch command?

2019-10-09 Thread Uemoto, Tomoki
Hi, All I don't understand which case to use the delay-boot option. Should I check as follows? 1. sbatch --delay-boot=

Re: [slurm-users] How to share GPU resources? (MPS or another way?)

2019-10-09 Thread Christopher Samuel
On 10/8/19 12:30 PM, Goetz, Patrick G wrote: It looks like GPU resources can only be shared by processes run by the same user? This is touched on in this bug https://bugs.schedmd.com/show_bug.cgi?id=7834 where it appears at one point MPS appeared to work for multiple users. It may be that

Re: [slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?

2019-10-09 Thread Matthew BETTINGER
Just curious if this option or oom setting (which we use) can leave the nodes in CG "completing" state. We have CG states quite often and only way is to reboot the node. I believe it occurs when parent process dies or gets killed or Z? Thanks. MB On 10/8/19, 6:11 AM, "slurm-users on behal

Re: [slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?

2019-10-09 Thread Jean-mathieu CHANTREIN
- Mail original - > Maybe I missed something else... That's right. Thank to Bjørn-Helge who help me. You must enable swapaccount in the kernel as shown here: https://unix.stackexchange.com/questions/531480/what-does-swapaccount-1-in-grub-cmdline-linux-default-do By default, this is

[slurm-users] Application level checkpointing

2019-10-09 Thread Oytun Peksel
Hi, I would like to setup a queing system for multiple users with limited resources. I'll have only 1 node and 48 cpus to work with. So I am using select/cons_res for select type. Have to use preemption because there are many jobs running with 3 different partitions with different priorities. T

Re: [slurm-users] How to share GPU resources? (MPS or another way?)

2019-10-09 Thread Kota Tsuyuzaki
> On 10/8/19 1:47 AM, Kota Tsuyuzaki wrote: > > GPU is running as well as gres gpu:1. And more, the NVIDIA docs looks > > to describe what I hit > > (https://docs.nvidia.com/deploy/mps/index.html#topic_4_3). That seems like > > the mps-server will be created to each > user and the server will be r