Hi,

I noticed that during standby promotion the startup process sends SIGUSR1 to
the slotsync worker to make it exit. Is there a reason for using SIGUSR1?

If the slotsync worker is blocked waiting for input from the primary (e.g.,
due to a network outage between the primary and standby), SIGUSR1 won't
interrupt the wait. As a result, the worker can remain stuck and delay
promotion for a long time.

Would it make sense to send SIGTERM instead, so the worker can exit promptly
even while waiting? I've attached a WIP patch that does this. I haven't updated
the source comments yet, but I can do so if we agree on the approach.

SIGTERM alone is not sufficient, though. A new slotsync worker could start
immediately after the old one exits and block promotion again. To address this,
the patch makes a newly started worker exit immediately if promotion is
in progress.

Thoughts?

Regards,

-- 
Fujii Masao

Attachment: v1-0001-Use-SIGTERM-to-stop-slotsync-worker-during-standb.patch
Description: Binary data

Reply via email to