date:20240125

[slurm-users] Problem using Podman with scrun on SLURM 23.11.3

2024-01-25 Thread Marcus Lauer

I am getting an unusual error when trying to run Podman containers using scrun on SLURM 23.11.3 (and 23.11.1 previously). In short, Podman works when not configured to use scrun, but when configured to use scrun it fails. Podman gives this error: scrun: fatal: Unable to request job alloca

Re: [slurm-users] Database cluster

2024-01-25 Thread Josef Dvoracek

To protect from HW failure, and to have more free hands when upgrading underlying OS, we use virtualization with "live migration"/HA and MariaDB server as a VM. VM is easy to backup, restore as a snapshot, clone for possible tests, etc. In the past, I deployed (customer-requirement) one site u

[slurm-users] slurmctld: slurm_bufs_sendto(msg_type=SRUN_STEP_SIGNAL) failed: Connection reset by peer

2024-01-25 Thread Rike-Benjamin Schuppner

Hi, I am getting the following error in the logs whenever I run a few srun jobs in a batch. Jan 25 11:24:03 slurmctl.XYZ slurmctld[272961]: slurmctld: debug: _send_timeout: Socket POLLERR: Connection reset by peer Jan 25 11:24:03 slurmctl.XYZ slurmctld[272961]: slurmctld: error: slurm_s

[slurm-users] Question about CPUs and cores

2024-01-25 Thread Gestió Servidors

Hi, I want to run a simple test that uses one node and four cores. Also, in my script, I execute a binary that reports me in what core is running one of the four tasks. These are my files: * submit script: #!/bin/bash #SBATCH --job-name=test_jobs # Job name #SBATCH --output=test_jo

[slurm-users] Problem using Podman with scrun on SLURM 23.11.3

Re: [slurm-users] Database cluster

[slurm-users] slurmctld: slurm_bufs_sendto(msg_type=SRUN_STEP_SIGNAL) failed: Connection reset by peer

[slurm-users] Question about CPUs and cores

4 matches

Site Navigation

Mail list logo

Footer information