Logical Replication of Multiple Schema Versions

2019-12-09 Thread Dan shmidt
Hi All,

Thanks for having such a great mailing list. I hope I'm sending to the correct 
distribution.

I will start with describing my current architecture and where do I use logical 
replication.
I have several nodes each running a postgres-sql database. Each of these nodes 
has an API server which uses the database in order to retrieve data.
I recently created an aggregated node which according to the logical 
replication documentation is a common use-case "Consolidating multiple 
databases into a single one (for example for analytical purposes)."
The aggregation node also has an API server and should serve the same use cases 
as a regular node serves but with aggregated information.

My question is about schema upgrades. As very well documented logical 
replication does not replicate schema changes, thus all schema modifications 
should be done on both regular-nodes and aggregated-node.
Here are my constraints and relaxation points:

  *   I would prefer having the freedom of making any schema change (i.e: 
removing/renaming a column).
  *   I must have the ability to upgrade each node separately and not taking 
all of the nodes down.
  *   I'm willing to allow a rather big (2 week) downtime in the replication as 
long as the regular-node keeps on working. (Hopefully aggregated-node can still 
be operative, and not be up to date with will regular-nodes)
  *   I'm willing to allow downtime of regular node as long as the process of 
upgrading the node is taking place.

I started with a naive approach:

  1.  Take down all the nodes
  2.  Schema upgrade for the aggregated-node
  3.  Schema upgrade for the regular-nodes
  4.  Start everything up

The problem I encountered with this approach is that while taking down the 
regular-nodes there might be some not yet replicated WAL entries containing 
schema prior to the upgrade, thus after step 4 above replication will fail.
My next approach was to let the logical replication "drain" until there are no 
changes in the database and then upgrading all of the nodes, but this breaks 
constraint #2.

What is the correct way to perform such an operation?
Is there a way to keep constraint #1 or the only option is to not allow 
"breaking" schema changes between versions.

Thank you,
Dan.


Re: Logical Replication of Multiple Schema Versions

2019-12-11 Thread Dan shmidt
Thanks Peter for your answer.

I was hoping to keep the freedom of performing any change to my schema.
Since the entire upgrade might take more than a week, there will be a time in 
which I have two publisher nodes with different schema versions (which might 
include a column rename).
I was hoping that were was a way to somehow apply some logic on the subscriber 
to convert one schema to another. I also believe that limiting schema changes 
only to those that won't break replication should suffice.

Dan.


From: Peter Eisentraut 
Sent: Wednesday, December 11, 2019 6:13 PM
To: Dan shmidt ; pgsql-general@lists.postgresql.org 

Subject: Re: Logical Replication of Multiple Schema Versions

On 2019-12-10 08:55, Dan shmidt wrote:
> What is the correct way to perform such an operation?
> Is there a way to keep constraint #1 or the only option is to not allow
> "breaking" schema changes between versions.

It all depends on the specific schema changes you want to make.  You can
add columns on the subscriber and remove columns on the publisher
without breaking things (unless there are not-null constraints).
Renaming columns will break replication until you rename them
everywhere.  Column type changes will usually just work as long as the
data fits into both the old and the new type.

You really need to carefully plan and test each class of scenarios
separately.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Logical replication stuck in catchup state

2020-06-09 Thread Dan shmidt
Thank you very much for your replies.

Regarding the server logs, I didn't find anything but healthy log when the 
server start which says that it is going to recover from the same point in WAL 
which was last sent.
Regarding bugfixes, I will try to update ASAP - but wouldn't a restart of the 
server release the lock? Is there a way to release the lock manually?

Any other suggestion on how to recover from this state without upgrading?
Is there a way to restart the replication from scratch?


Sent from Outlook<http://aka.ms/weboutlook>

________
From: Dan shmidt
Sent: Wednesday, June 10, 2020 12:30 AM
To: pgsql-gene...@postgresql.org 
Subject: Logical replication stuck in catchup state

Hi All,

We have a setup in which there are several master nodes replicating to a single 
slave/backup node. We are using Postgres 11.4.
Recently, one of the nodes seems to be stuck and stopped replicating.
I did some basic troubleshooting and couldn't find the root cause for that.

On one hand:
  - the replication slot does seem to be active according to 
pg_replication_slots (Sorry no screenshot)
  - on slave node it seems that last_msg_receipt_time is updating on 
pg_stat_subscription

On the other hand:
 - on the slave node: received_lsn keeps pointing on the same wal segment 
(pg_stat_subscription)
 - redo_lsn - restart_lsn shows ~20GB lag

According to logs on the master it seems that the sender hits a timeout, when 
trying to increase the wal_sender_timeout even to 0 (no timeout) - it doesn't 
have any effect. On the other hand, the  last_msg_receipt_time is updated. How 
is that possible?

Screenshots attached. The stuck subscription/replication slot is the one ending 
with "53db6". On images with more than one row - it's the second one.


Any suggestions on what may be the root cause or how to continue debugging?
Appreciate your help.

Thank you,
Dan.




Logical replication from multiple masters - master starvation

2020-08-04 Thread Dan shmidt
Hi everyone,

I'm using postgres 11.4
I have a configuration in which there's one subscriber (slave) node which 
subscribes to several (4) publisher nodes (master).
I noticed one of the master servers keeps lagging behind. I noticed that the 
replication worker on the master keeps failing on timeouts.
When querying pg_stat_subscription on the slave I noticed that 
last_msg_send_time for this master hardly ever updates.

Could this be starvation?
Is there any special configuration on either side (master/slave) that can help 
reduce that?
What's the logic on the slave that chooses from which subscription to process 
incoming data?
How can I further debug this?

Thanks,
Dan.


Sent from Outlook