On Mon, Sep 8, 2025 at 11:03 AM legrand legrand <[email protected]>
wrote:

> Hello all the readers,
>
> For some projects we need a fast *manual* switchover to address Near Zero
> downtime maintenance
> (not speaking here about automated failover like those provided by HA
> tools, but just planned, controlled operations)
>
>
> Database Physical replication switchover itself:
> - initial replication (before switchover) should be synchronous or
> replication LAG should be controlled to prevent data loss.
> - Switchover duration seems not "compressible" under a few seconds
> (because of primary shutdown, promotion, new standby catch up, ...)
> - Application retry strategy (after disconnection) should be tuned using
> proper retry delay. Pooler or specific driver may help.
>

There will always be a few seconds delay while the applications reconnect.

Do the applications connect via a VIP?  That's simpler for the application.

This is what I do from the not-yet-new-primary:

   1. psql -h  $CurrentPrimary -c "ALTER SYSTEM SET
   synchronous_standby_names TO '*';"
   2. Wait a few seconds.
   3. ssh $CurrentPrimary sudo ip del $VIP # cmd is more complicated, but
   you get the idea
   4. ssh $CurrentPrimary pg_ctl stop -mfast # to kill connections, has to
   happen, no matter the solution.
   5. pg_ctl promote
   6. sudo ip add $VIP
   7. Replicate from new-primary to new-replica "at leisure".

No retry delay, since the application directly goes to the new server.
Steps 3-6 are in a script, and what pgpool does, except I do it.  #4 is by
far the slowest.  ssh authentication delay in #3 and #4 are nonexistent if
you have "pre-created" an ssh socket.

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Reply via email to