Re: [slurm-users] Database cluster

2024-01-26 Thread Tina Friedrich
We do the same as Josef - we run the database on a VM (single VM, MariaDB) and leave it up to (in our case) VMWare to ensure its availability. Tina On 25/01/2024 11:34, Josef Dvoracek wrote: To protect from HW failure, and to have more free hands when upgrading underlying OS, we use virtualiza

Re: [slurm-users] Database cluster

2024-01-25 Thread Josef Dvoracek
To protect from HW failure, and to have more free hands when upgrading underlying OS, we use virtualization with "live migration"/HA and MariaDB server as a VM. VM is easy to backup, restore as a snapshot, clone for possible tests, etc. In the past, I deployed (customer-requirement) one site u

Re: [slurm-users] Database cluster

2024-01-24 Thread Henkel, Andreas
Hi Daniel, We run a simple Galera-MySQL Cluster and have a HAproxy running on all clients to steer the requests (round-Robin) to one of the DB-nodes that answer the health check properly. Best, Andreas Am 23.01.2024 um 15:35 schrieb Daniel L'Hommedieu :  Xand, Thanks - that’s great to hear.

Re: [slurm-users] Database cluster

2024-01-23 Thread Daniel L'Hommedieu
Xand, Thanks - that’s great to hear. I was thinking of using Anycast to achieve the same thing, but good to know that keepalived is a viable solution as well. Best, Daniel > On Jan 23, 2024, at 09:29, Xand Meaden wrote: > > Hi, > > We are using Percona XtraDB cluster to achieve HA for our S

Re: [slurm-users] Database cluster

2024-01-23 Thread Xand Meaden
Hi, We are using Percona XtraDB cluster to achieve HA for our Slurm databases. There is a single virtual IP that will be kept on one of the cluster's servers using keepalived. Regards, Xand From: slurm-users on behalf of Daniel L'Hommedieu Sent: 22 January 20

Re: [slurm-users] Database cluster

2024-01-23 Thread Daniel L'Hommedieu
Hi Diego. In our setup, the database is critical. We have some wrapper scripts that consult the database for information, and we also set environment variables on login, based on user/partition associations. If the database is down, none of those things work. I doubt there is appetite in the

Re: [slurm-users] Database cluster

2024-01-23 Thread Diego Zuccato
IIUC the database is not "critical": if it goes down, you lose access to some statistics. But job data gets cached anyway and the db will be updated when it comes back online. Diego Il 22/01/2024 18:23, Daniel L'Hommedieu ha scritto: Community: What do you do to ensure database reliability i