On Mon, Sep 8, 2025 at 11:03 AM legrand legrand <[email protected]> wrote:
> Hello all the readers, > > For some projects we need a fast *manual* switchover to address Near Zero > downtime maintenance > (not speaking here about automated failover like those provided by HA > tools, but just planned, controlled operations) > > > Database Physical replication switchover itself: > - initial replication (before switchover) should be synchronous or > replication LAG should be controlled to prevent data loss. > - Switchover duration seems not "compressible" under a few seconds > (because of primary shutdown, promotion, new standby catch up, ...) > - Application retry strategy (after disconnection) should be tuned using > proper retry delay. Pooler or specific driver may help. > There will always be a few seconds delay while the applications reconnect. Do the applications connect via a VIP? That's simpler for the application. This is what I do from the not-yet-new-primary: 1. psql -h $CurrentPrimary -c "ALTER SYSTEM SET synchronous_standby_names TO '*';" 2. Wait a few seconds. 3. ssh $CurrentPrimary sudo ip del $VIP # cmd is more complicated, but you get the idea 4. ssh $CurrentPrimary pg_ctl stop -mfast # to kill connections, has to happen, no matter the solution. 5. pg_ctl promote 6. sudo ip add $VIP 7. Replicate from new-primary to new-replica "at leisure". No retry delay, since the application directly goes to the new server. Steps 3-6 are in a script, and what pgpool does, except I do it. #4 is by far the slowest. ssh authentication delay in #3 and #4 are nonexistent if you have "pre-created" an ssh socket. -- Death to <Redacted>, and butter sauce. Don't boil me, I'm still alive. <Redacted> lobster!
