Dealing with latency to replication slave; what to do?

2018-07-24 Thread Rory Falloon
Hi,

Looking for any tips here on how to best maintain a replication slave which
is operating under some latency between networks - around 230ms. On a good
day/week, replication will keep up for a number of days, but however, when
the link is under higher than average usage, keeping replication active can
last merely minutes before falling behind again.

2018-07-24 18:46:14 GMTLOG:  database system is ready to accept read only
connections
2018-07-24 18:46:15 GMTLOG:  started streaming WAL from primary at
2B/9300 on timeline 1
2018-07-24 18:59:28 GMTLOG:  incomplete startup packet
2018-07-24 19:15:36 GMTLOG:  incomplete startup packet
2018-07-24 19:15:36 GMTLOG:  incomplete startup packet
2018-07-24 19:15:37 GMTLOG:  incomplete startup packet

As you can see above, it lasted about half an hour before falling out of
sync.

On the master, I have wal_keep_segments=128. What is happening when I see
"incomplete startup packet" - is it simply the slave has fallen behind,
and  cannot 'catch up' using the wal segments quick enough? I assume the
slave is using the wal segments to replay changes and assuming there are
enough wal segments to cover the period it cannot stream properly, it will
eventually recover?


Re: Dealing with latency to replication slave; what to do?

2018-07-24 Thread Rory Falloon
Hi Andres,

regarding your first reply, I was inferring that from the fact I saw those
messages at the same time the replication stream fell behind. What other
logs would be more pertinent to this situation?



On Tue, Jul 24, 2018 at 4:02 PM Andres Freund  wrote:

> Hi,
>
> On 2018-07-24 15:39:32 -0400, Rory Falloon wrote:
> > Looking for any tips here on how to best maintain a replication slave
> which
> > is operating under some latency between networks - around 230ms. On a
> good
> > day/week, replication will keep up for a number of days, but however,
> when
> > the link is under higher than average usage, keeping replication active
> can
> > last merely minutes before falling behind again.
> >
> > 2018-07-24 18:46:14 GMTLOG:  database system is ready to accept read only
> > connections
> > 2018-07-24 18:46:15 GMTLOG:  started streaming WAL from primary at
> > 2B/9300 on timeline 1
> > 2018-07-24 18:59:28 GMTLOG:  incomplete startup packet
> > 2018-07-24 19:15:36 GMTLOG:  incomplete startup packet
> > 2018-07-24 19:15:36 GMTLOG:  incomplete startup packet
> > 2018-07-24 19:15:37 GMTLOG:  incomplete startup packet
> >
> > As you can see above, it lasted about half an hour before falling out of
> > sync.
>
> How can we see that from the above? The "incomplete startup messages"
> are independent of streaming rep? I think you need to show us more logs.
>
>
> > On the master, I have wal_keep_segments=128. What is happening when I see
> > "incomplete startup packet" - is it simply the slave has fallen behind,
> > and  cannot 'catch up' using the wal segments quick enough? I assume the
> > slave is using the wal segments to replay changes and assuming there are
> > enough wal segments to cover the period it cannot stream properly, it
> will
> > eventually recover?
>
> You might want to look into replication slots to ensure the primary
> keeps the necessary segments, but not more, around.  You might also want
> to look at wal_compression, to reduce the bandwidth usage.
>
> Greetings,
>
> Andres Freund
>


WAL streaming and dropping a large table

2021-10-21 Thread Rory Falloon
Hi,

My postgres server setup is thus: a production database, which is using WAL
streaming (hot standby) to four other hosts of various latencies. I Have a
table that is ~200GB which has been backed up, and now I Want to remove it.
The table is not in use, it is a child table to a parent table that _is_ in
use but I foresee no issues here. The total DB size is roughly 300GB.  The
main reason for needing to remove it is to recover the space back on the
partition which is humming away at 88% usage, and the table I wish to drop
is better off in an archive somewhere.

I just removed around 10% of it with  'delete from', which of course
cascaded to the replication hosts. This increased the size of my pg_xlog
folder (from ~3GB to ~6.5GB) which of course increased my partition usage.
Obviously this is expected, but I wonder what would happen if I had issued
the 'drop table'?

I expect the nature of enabling max_replication_slots would mean the
database would retain the wal segments until all have caught up; it could
take quite a long time to 'catch up' after the 'drop table' command? It
took about 10 minutes before the pg_xlog folder size had 'settled down' to
what I normally see as default which is ~3GB.

> wal_keep_segments is defined as 128,
> wal_max_size is not defined,
> max_replication_slots is enabled

I'd prefer to use 'drop table' as it would recover the data immediately,
but not if it will impact the production database in a way that the
partition will become full, which defeats the purpose. Is it advisable to
move the pg_xlog folder to another volume on the system with more space
(which I have) and symlink - obviously a stop & start of the db required -
and then let the WAL archives fill up as need be? Or am I missing something
obvious (likely)

Thanks