[slurm-users] Re: canonical way to run longer shell/bash interactive job (instead of srun inside of screen/tmux at front-end)?

2024-02-26 Thread Ward Poelmans via slurm-users
Hi, On 26/02/2024 09:27, Josef Dvoracek via slurm-users wrote: Are you anybody using something more advanced and still understandable by casual user of HPC? I'm not sure it qualifies but: sbatch --wrap 'screen -D -m' srun --jobid --pty screen -rd Or: sbatch -J screen --wrap 'screen -D -m'

[slurm-users] Re: Slurm Cleaning Up $XDG_RUNTIME_DIR Before It Should?

2024-05-15 Thread Ward Poelmans via slurm-users
Hi, This is systemd, not slurm. We've also seen it being created and removed. As far as I understood something about the session that systemd clean up. We've worked around by adding this to the prolog: MY_XDG_RUNTIME_DIR=/dev/shm/${USER} mkdir -p $MY_XDG_RUNTIME_DIR echo "export XDG_RUNTIME_DI

[slurm-users] Re: Using sharding

2024-07-04 Thread Ward Poelmans via slurm-users
Hi Ricardo, It should show up like this: Gres=gpu:gtx_1080_ti:4(S:0-1),shard:gtx_1080_ti:16(S:0-1) CfgTRES=cpu=32,mem=515000M,billing=130,gres/gpu=4,gres/shard=16 AllocTRES=cpu=8,mem=31200M,gres/shard=1 I can't directly spot any error however. Our gres.conf is simply `AutoDetect=nvm

[slurm-users] Re: Using sharding

2024-07-05 Thread Ward Poelmans via slurm-users
Hi Arnuld, On 5/07/2024 13:56, Arnuld via slurm-users wrote: It should show up like this:     Gres=gpu:gtx_1080_ti:4(S:0-1),shard:gtx_1080_ti:16(S:0-1) What's the meaning of (S:0-1) here? The sockets to which the GPUs are associated: If GRES are associated with specific sockets, t

[slurm-users] Re: Access to --constraint= in Lua cli_filter?

2024-08-19 Thread Ward Poelmans via slurm-users
Hi Kevin, On 19/08/2024 08:15, Kevin Buckley via slurm-users wrote: If I supply a   --constraint= option to an sbatch/salloc/srun, does the arg appear inside any object that a Lua CLI Filter could access? Have a look if you can spot them in: function slurm_cli_pre_submit(options, pack_offse

[slurm-users] Re: A note on updating Slurm from 23.02 to 24.05 & multi-cluster

2024-09-26 Thread Ward Poelmans via slurm-users
Hi Bjørn-Helge, On 26/09/2024 09:50, Bjørn-Helge Mevik via slurm-users wrote: Ward Poelmans via slurm-users writes: We hit a snag when updating our clusters from Slurm 23.02 to 24.05. After updating the slurmdbd, our multi cluster setup was broken until everything was updated to 24.05. We

[slurm-users] A note on updating Slurm from 23.02 to 24.05 & multi-cluster

2024-09-25 Thread Ward Poelmans via slurm-users
Hi all, We hit a snag when updating our clusters from Slurm 23.02 to 24.05. After updating the slurmdbd, our multi cluster setup was broken until everything was updated to 24.05. We had not anticipated this. SchedMD says that fixing it would be a very complex operation. Hence, this warning to