[slurm-users] Re: sbatch and --nodes

2024-05-31 Thread Hermann Schwärzler via slurm-users
Hi Michael, if you submit a job-array, all resources related options (number of nodes, tasks, cpus per task, memory, time, ...) are meant *per array-task*. So in your case you start 100 array-tasks (you could also call them "sub-jobs") that *each* (not your whole job) is limited to one node, on

[slurm-users] sbatch and --nodes

2024-05-31 Thread Michael DiDomenico via slurm-users
its friday and i'm either doing something silly or have a misconfig somewhere, i can't figure out which when i run sbatch --nodes=1 --cpus-per-task=1 --array=1-100 --output test_%A_%a.txt --wrap 'uname -n' sbatch doesn't seem to be adhering to the --nodes param. when i look at my output files i

[slurm-users] Re: Container Jobs "hanging"

2024-05-31 Thread Joshua Randall via slurm-users
Just an update to say that this issue for me appears to be specific to the `runc` runtime (or `nvidia-container-runtime` when it uses `runc` internally). I switched to using `crun` and the problem went away -- containers run using `srun --container` now terminate after the inner process terminates.

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-31 Thread Benjamin Smith via slurm-users
It could be systemd doing that. Since slurmdbd is being started with -D, I would verify that slurmdbd.service has Type=simple and not Type=forking. The systemctl status output later in the thread shows systemd starting slurmdbd with -D. If that's the slurmdbd package from Ubuntu you might f