I am getting an unusual error when trying to run Podman containers
using scrun on SLURM 23.11.3 (and 23.11.1 previously). In short, Podman
works when not configured to use scrun, but when configured to use scrun it
fails.
Podman gives this error:
scrun: fatal: Unable to request job alloca
To protect from HW failure, and to have more free hands when upgrading
underlying OS, we use virtualization with "live migration"/HA and
MariaDB server as a VM.
VM is easy to backup, restore as a snapshot, clone for possible tests, etc.
In the past, I deployed (customer-requirement) one site u
Hi,
I am getting the following error in the logs whenever I run a few srun jobs in
a batch.
Jan 25 11:24:03 slurmctl.XYZ slurmctld[272961]: slurmctld: debug:
_send_timeout: Socket POLLERR: Connection reset by peer
Jan 25 11:24:03 slurmctl.XYZ slurmctld[272961]: slurmctld: error:
slurm_s
Hi,
I want to run a simple test that uses one node and four cores. Also, in my
script, I execute a binary that reports me in what core is running one of the
four tasks. These are my files:
* submit script:
#!/bin/bash
#SBATCH --job-name=test_jobs # Job name
#SBATCH --output=test_jo