Re: Estimating wal_keep_size
On 2021-06-16 18:02, Julien Rouhaud wrote: On Wed, Jun 16, 2021 at 05:36:24PM -0700, Dean Gibson (DB Administrator) wrote: Is this reasonable thinking? I'd think that one would want a *wal_keep_size* to cover the pending updates while the standby server might be unavailable, however long one might anticipate that would be. It's usually a better approach to use a replication slot, to keep all the required WAL files, and only when needed. See https://www.postgresql.org/docs/current/warm-standby.html#STREAMING-REPLICATION-SLOTS for more details. Note that a replication slot will keep all WAL files, which might eventually lead to an outage if the standby doesn't come back before the filesystem containing the logs get full. You can cap the maximum amount of retained WAL files since pg 13 using max_slot_wal_keep_size, see https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-MAX-SLOT-WAL-KEEP-SIZE. Granted, but the same question arises about the value for max_slot_wal_keep_size. Setting either too low risks data loss, & setting either too high results in unnecessary disk space used. The question was, is the estimated VALUE reasonable under the circumstances?
Re: Estimating wal_keep_size
Le sam. 19 juin 2021 à 02:13, Dean Gibson (DB Administrator) < [email protected]> a écrit : > > Granted, but the same question arises about the value for > max_slot_wal_keep_size. Setting either too low risks data loss, & setting > either too high results in unnecessary disk space used. The question was, > is the estimated VALUE reasonable under the circumstances? > it may be, until one day it won't be. and that day usually happens. when you set up this kind of limit you choose service availability over your data, so you have to accept that it may not be enough. if this is a problem don't setup a limit. >
Re: Estimating wal_keep_size
On Fri, 18 Jun 2021 at 23:58, Julien Rouhaud wrote: > Le sam. 19 juin 2021 à 02:13, Dean Gibson (DB Administrator) < > [email protected]> a écrit : > >> >> Granted, but the same question arises about the value for >> max_slot_wal_keep_size. Setting either too low risks data loss, & setting >> either too high results in unnecessary disk space used. The question was, >> is the estimated VALUE reasonable under the circumstances? >> > > it may be, until one day it won't be. and that day usually happens. when > you set up this kind of limit you choose service availability over your > data, so you have to accept that it may not be enough. if this is a problem > don't setup a limit. > >> yep, that day does come :) and in that case, i used to drop slot (primary is high priority) and rebuild the repica. We already had multiple replicas under load balancer, so it was feasible. Anyways, I used to emit pg_current_wal_lsn to graphite (or any other telemetry monitoring) every minute or so, ot be able to calculate wal growth over period of time. Then used it to estimate how much of the disk would be required for a PITR setup like barman if we use a 7 day WAL post backup. one can use other math expressions like moving avg to estimate wal growth over the time duration of an incident with the replica etc. ofcourse one needs to also calculate how fast the wals would be played back, if one has hot_standby_feedback for long running queries on replicas etc but I think I put my point. -- Thanks, Vijay Mumbai, India
