from:"Tim Schneider"

[slurm-users] Re: Restricting local disk storage of jobs

2024-02-07 Thread Tim Schneider via slurm-users

xecution times. The main question is "where does the tmpfs plugin find the quota limit for the job?" On Feb 6, 2024, at 08:39, Tim Schneider via slurm-users wrote: Hi, In our SLURM cluster, we are using the job_container/tmpfs plugin to ensure that each user can use /tm

[slurm-users] Re: [ext] Restricting local disk storage of jobs

2024-02-06 Thread Tim Schneider via slurm-users

+0100, Tim Schneider wrote: Hi Magnus, thanks for your reply! If you can, would you mind sharing the InitScript of your attempt at getting it to work? Best, Tim On 06.02.24 15:19, Hagdorn, Magnus Karl Moritz wrote: Hi Tim, we are using the container/tmpfs plugin to map /tmp to a local NVMe

[slurm-users] Re: [ext] Restricting local disk storage of jobs

2024-02-06 Thread Tim Schneider via slurm-users

ot of local scratch space. I don't think this happens very often if at all. Regards magnus [1] https://slurm.schedmd.com/job_container.conf.html#OPT_InitScript On Tue, 2024-02-06 at 14:39 +0100, Tim Schneider via slurm-users wrote: Hi, In our SLURM cluster, we are using the job_conta

[slurm-users] Restricting local disk storage of jobs

2024-02-06 Thread Tim Schneider via slurm-users

Hi, In our SLURM cluster, we are using the job_container/tmpfs plugin to ensure that each user can use /tmp and it gets cleaned up after them. Currently, we are mapping /tmp into the nodes RAM, which means that the cgroups make sure that users can only use a certain amount of storage inside /

Re: [slurm-users] slurmstepd: error: load_ebpf_prog: BPF load error (No space left on device). Please check your system limits (MEMLOCK).

2024-01-24 Thread Tim Schneider

rked until this recent change, then other kernel versions should show the same behavior. But as far as I can tell it still works just fine with newer kernels. Cheers, Stefan On Tue, 23 Jan 2024 15:20:56 +0100 Tim Schneider wrote: Hi, I have filed a bug report with SchedMD (https://bugs.schedmd.

Re: [slurm-users] slurmstepd: error: load_ebpf_prog: BPF load error (No space left on device). Please check your system limits (MEMLOCK).

2024-01-23 Thread Tim Schneider

we should check? On Thu, Jan 4, 2024 at 3:03?PM Tim Schneider wrote: Hi, I am using SLURM 22.05.9 on a small compute cluster. Since I reinstalled two of our nodes, I get the following error when launching a job: slurmstepd: error: load_ebpf_prog: BPF load error (No space left on device). Plea

[slurm-users] slurmstepd: error: load_ebpf_prog: BPF load error (No space left on device). Please check your system limits (MEMLOCK).

2024-01-04 Thread Tim Schneider

Hi, I am using SLURM 22.05.9 on a small compute cluster. Since I reinstalled two of our nodes, I get the following error when launching a job: slurmstepd: error: load_ebpf_prog: BPF load error (No space left on device). Please check your system limits (MEMLOCK). Also the cgroups do not seem

Re: [slurm-users] scontrol reboot does not allow new jobs to be scheduled if nextstate=RESUME is set

2023-10-25 Thread Tim Schneider

then nextstate is irrelevant. We always use "reboot ASAP" because our cluster is usually so busy that nodes never become idle if left to themselves :-) FYI: We regularly make package updates and firmware updates using the "scontrol reboot asap" method which is explained in

Re: [slurm-users] scontrol reboot does not allow new jobs to be scheduled if nextstate=RESUME is set

2023-10-25 Thread Tim Schneider

ion! Best, tim On 25.10.23 02:10, Christopher Samuel wrote: On 10/24/23 12:39, Tim Schneider wrote: Now my issue is that when I run "scontrol reboot ASAP nextstate=RESUME ", the node goes in "mix@" state (not drain), but no new jobs get scheduled until the node reboots. Esse

[slurm-users] scontrol reboot does not allow new jobs to be scheduled if nextstate=RESUME is set

2023-10-24 Thread Tim Schneider

Hi, from my understanding, if I run "scontrol reboot ", the node should continue to operate as usual and reboots once it is idle. When adding the ASAP flag (scontrol reboot ASAP ), the node should go into drain state and not accept any more jobs. Now my issue is that when I run "scontrol reb

Re: [slurm-users] task/cgroup plugin causes "srun: error: task 0 launch failed: Plugin initialization failed" error on Ubuntu 22.04

2023-06-17 Thread Tim Schneider

Tim Schneider wrote: Hi again, I just realized that https://groups.google.com/g/slurm-users/c/0dJhe5r6_2Q?pli=1 wrote at some point that he build Slurm 22 instead of using the Ubuntu repo version. So I guess I will have to look into that. Best, Tim On 6/16/23 10:36, Tim Schneider wrote: Hi Abel

Re: [slurm-users] task/cgroup plugin causes "srun: error: task 0 launch failed: Plugin initialization failed" error on Ubuntu 22.04

2023-06-17 Thread Tim Schneider

Tim Schneider wrote: Hi again, I just realized that https://groups.google.com/g/slurm-users/c/0dJhe5r6_2Q?pli=1 wrote at some point that he build Slurm 22 instead of using the Ubuntu repo version. So I guess I will have to look into that. Best, Tim On 6/16/23 10:36, Tim Schneider wrote: Hi Abel

[slurm-users] Fwd: task/cgroup plugin causes "srun: error: task 0 launch failed: Plugin initialization failed" error on Ubuntu 22.04

2023-06-15 Thread Tim Schneider

Hi, I am maintaining the SLURM cluster of my research group. Recently I updated to Ubuntu 22.04 and Slurm 21.08.5 and ever since, I am unable to launch jobs. When launching a job, I receive the following error: /$ srun --nodes=1 --ntasks-per-node=1 -c 1 --mem-per-cpu 1G --time=01:00:00 --pty

[slurm-users] Re: Restricting local disk storage of jobs

[slurm-users] Re: [ext] Restricting local disk storage of jobs

[slurm-users] Re: [ext] Restricting local disk storage of jobs

[slurm-users] Restricting local disk storage of jobs

Re: [slurm-users] slurmstepd: error: load_ebpf_prog: BPF load error (No space left on device). Please check your system limits (MEMLOCK).

Re: [slurm-users] slurmstepd: error: load_ebpf_prog: BPF load error (No space left on device). Please check your system limits (MEMLOCK).

[slurm-users] slurmstepd: error: load_ebpf_prog: BPF load error (No space left on device). Please check your system limits (MEMLOCK).

Re: [slurm-users] scontrol reboot does not allow new jobs to be scheduled if nextstate=RESUME is set

Re: [slurm-users] scontrol reboot does not allow new jobs to be scheduled if nextstate=RESUME is set

[slurm-users] scontrol reboot does not allow new jobs to be scheduled if nextstate=RESUME is set

Re: [slurm-users] task/cgroup plugin causes "srun: error: task 0 launch failed: Plugin initialization failed" error on Ubuntu 22.04

Re: [slurm-users] task/cgroup plugin causes "srun: error: task 0 launch failed: Plugin initialization failed" error on Ubuntu 22.04

[slurm-users] Fwd: task/cgroup plugin causes "srun: error: task 0 launch failed: Plugin initialization failed" error on Ubuntu 22.04

13 matches

Site Navigation

Mail list logo

Footer information