xecution times. The
main question is "where does the tmpfs plugin find the quota limit for
the job?"
On Feb 6, 2024, at 08:39, Tim Schneider via slurm-users
wrote:
Hi,
In our SLURM cluster, we are using the job_container/tmpfs plugin to
ensure that each user can use /tm
+0100, Tim Schneider wrote:
Hi Magnus,
thanks for your reply! If you can, would you mind sharing the
InitScript
of your attempt at getting it to work?
Best,
Tim
On 06.02.24 15:19, Hagdorn, Magnus Karl Moritz wrote:
Hi Tim,
we are using the container/tmpfs plugin to map /tmp to a local NVMe
ot of local scratch
space. I don't think this happens very often if at all.
Regards
magnus
[1]
https://slurm.schedmd.com/job_container.conf.html#OPT_InitScript
On Tue, 2024-02-06 at 14:39 +0100, Tim Schneider via slurm-users wrote:
Hi,
In our SLURM cluster, we are using the job_conta
Hi,
In our SLURM cluster, we are using the job_container/tmpfs plugin to
ensure that each user can use /tmp and it gets cleaned up after them.
Currently, we are mapping /tmp into the nodes RAM, which means that the
cgroups make sure that users can only use a certain amount of storage
inside /
rked until this recent change, then other kernel
versions should show the same behavior. But as far as I can tell
it still works just fine with newer kernels.
Cheers,
Stefan
On Tue, 23 Jan 2024 15:20:56 +0100
Tim Schneider wrote:
Hi,
I have filed a bug report with SchedMD
(https://bugs.schedmd.
we should check?
On Thu, Jan 4, 2024 at 3:03?PM Tim Schneider wrote:
Hi,
I am using SLURM 22.05.9 on a small compute cluster. Since I
reinstalled two of our nodes, I get the following error when
launching a job:
slurmstepd: error: load_ebpf_prog: BPF load error (No space left on
device). Plea
Hi,
I am using SLURM 22.05.9 on a small compute cluster. Since I reinstalled
two of our nodes, I get the following error when launching a job:
slurmstepd: error: load_ebpf_prog: BPF load error (No space left on
device). Please check your system limits (MEMLOCK).
Also the cgroups do not seem
then nextstate is
irrelevant.
We always use "reboot ASAP" because our cluster is usually so busy that
nodes never become idle if left to themselves :-)
FYI: We regularly make package updates and firmware updates using the
"scontrol reboot asap" method which is explained in
ion!
Best,
tim
On 25.10.23 02:10, Christopher Samuel wrote:
On 10/24/23 12:39, Tim Schneider wrote:
Now my issue is that when I run "scontrol reboot ASAP nextstate=RESUME
", the node goes in "mix@" state (not drain), but no new jobs get
scheduled until the node reboots. Esse
Hi,
from my understanding, if I run "scontrol reboot ", the node
should continue to operate as usual and reboots once it is idle. When
adding the ASAP flag (scontrol reboot ASAP ), the node should go
into drain state and not accept any more jobs.
Now my issue is that when I run "scontrol reb
Tim Schneider wrote:
Hi again,
I just realized that
https://groups.google.com/g/slurm-users/c/0dJhe5r6_2Q?pli=1 wrote at
some point that he build Slurm 22 instead of using the Ubuntu repo
version. So I guess I will have to look into that.
Best,
Tim
On 6/16/23 10:36, Tim Schneider wrote:
Hi Abel
Tim Schneider wrote:
Hi again,
I just realized that
https://groups.google.com/g/slurm-users/c/0dJhe5r6_2Q?pli=1 wrote at
some point that he build Slurm 22 instead of using the Ubuntu repo
version. So I guess I will have to look into that.
Best,
Tim
On 6/16/23 10:36, Tim Schneider wrote:
Hi Abel
Hi,
I am maintaining the SLURM cluster of my research group. Recently I
updated to Ubuntu 22.04 and Slurm 21.08.5 and ever since, I am unable to
launch jobs. When launching a job, I receive the following error:
/$ srun --nodes=1 --ntasks-per-node=1 -c 1 --mem-per-cpu 1G
--time=01:00:00 --pty
13 matches
Mail list logo