On Sun, Mar 22, 2026 at 1:52 AM Amit Kapila <[email protected]> wrote: > > On Wed, Mar 18, 2026 at 9:35 PM Fujii Masao <[email protected]> wrote: > > > > I noticed that during standby promotion the startup process sends SIGUSR1 to > > the slotsync worker to make it exit. Is there a reason for using SIGUSR1? > > > > IIRC, this same signal is used for both the backend executing > pg_sync_replication_slots() and slotsync worker. We want the worker to > exit and error_out backend. Using SIGTERM for backend could result in > its exit.
Why do we want the backend running pg_sync_replication_slots() to throw an error here, rather than just exit? If emitting an error is really required, another option would be to store the process type in SlotSyncCtx and send different signals accordingly, for example, SIGTERM for the slotsync worker and another signal for a backend. But it seems simpler and sufficient to have the backend exit in this case as well. > Also, we want the last slotsync cycle to complete before > promotion so that chances of subscribers that do failover/switchover > to new primary has better chances of finding failover slots > sync-ready. I'm not sure how much this behavior helps in failover/switchover scenarios. But the main issue is that if a primary crash triggers standby promotion, that last slotsync cycle can get stuck waiting for input from the primary, which delays promotion. IOW, failover time can become unnecessarily long due to the slotsync worker. I'd like to address that problem. Regards, -- Fujii Masao
