Re: Estimating wal_keep_size

2021-06-18 Thread Dean Gibson (DB Administrator)

On 2021-06-16 18:02, Julien Rouhaud wrote:

On Wed, Jun 16, 2021 at 05:36:24PM -0700, Dean Gibson (DB Administrator) wrote:

Is this reasonable thinking?

I'd think that one would want a *wal_keep_size* to cover the pending updates 
while the standby server might be unavailable, however long one might 
anticipate that would be.

It's usually a better approach to use a replication slot, to keep all the 
required WAL files, and only when needed.  See 
https://www.postgresql.org/docs/current/warm-standby.html#STREAMING-REPLICATION-SLOTS
 for more details.

Note that a replication slot will keep all WAL files, which might eventually 
lead to an outage if the standby doesn't come back before the filesystem 
containing the logs get full.  You can cap the maximum amount of retained WAL 
files since pg 13 using max_slot_wal_keep_size, see
https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-MAX-SLOT-WAL-KEEP-SIZE.


Granted, but the same question arises about the value for 
max_slot_wal_keep_size.  Setting either too low risks data loss, & 
setting either too high results in unnecessary disk space used.  The 
question was, is the estimated VALUE reasonable under the circumstances?


Re: Estimating wal_keep_size

2021-06-18 Thread Julien Rouhaud
Le sam. 19 juin 2021 à 02:13, Dean Gibson (DB Administrator) <
[email protected]> a écrit :

>
> Granted, but the same question arises about the value for
> max_slot_wal_keep_size.  Setting either too low risks data loss, & setting
> either too high results in unnecessary disk space used.  The question was,
> is the estimated VALUE reasonable under the circumstances?
>

it may be, until one day it won't be. and that day usually happens. when
you set up this kind of limit you choose service availability over your
data, so you have to accept that it may not be enough. if this is a problem
don't setup a limit.

>


Re: Estimating wal_keep_size

2021-06-18 Thread Vijaykumar Jain
On Fri, 18 Jun 2021 at 23:58, Julien Rouhaud  wrote:

> Le sam. 19 juin 2021 à 02:13, Dean Gibson (DB Administrator) <
> [email protected]> a écrit :
>
>>
>> Granted, but the same question arises about the value for
>> max_slot_wal_keep_size.  Setting either too low risks data loss, & setting
>> either too high results in unnecessary disk space used.  The question was,
>> is the estimated VALUE reasonable under the circumstances?
>>
>
> it may be, until one day it won't be. and that day usually happens. when
> you set up this kind of limit you choose service availability over your
> data, so you have to accept that it may not be enough. if this is a problem
> don't setup a limit.
>
>>
yep, that day does come :) and in that case, i used to drop slot (primary
is high priority) and rebuild the repica. We already had multiple replicas
under load balancer, so it was feasible.

Anyways, I used to emit pg_current_wal_lsn to graphite (or any other
telemetry monitoring) every minute or so, ot be able to calculate wal
growth over period of time.
Then used it to estimate how much of the disk would be required for a PITR
setup like barman if we use a 7 day WAL post backup.
one can use other math expressions like  moving avg to estimate wal
growth  over the time duration of an incident with the replica etc.
ofcourse one needs to also calculate how fast the wals would be played
back, if one has hot_standby_feedback for long running queries on replicas
etc but I think I put my point.


-- 
Thanks,
Vijay
Mumbai, India