On Thu, Mar 19, 2026 at 7:08 PM Fujii Masao <[email protected]> wrote:
>
> On Sat, Feb 28, 2026 at 2:03 PM Amit Kapila <[email protected]> wrote:
> >
> > On Fri, Feb 27, 2026 at 8:34 PM Fujii Masao <[email protected]> wrote:
> > >
> > > Normally, the slotsync worker updates the standby slot using the 
> > > primary's slot
> > > state. However, when confirmed_flush_lsn matches but restart_lsn does not,
> > > the worker does not actually update the standby slot. Despite that, the 
> > > current
> > > code of update_local_synced_slot() appears to treat this situation as if
> > > an update occurred. As a result, the worker sleeps only for the minimum
> > > interval (200 ms) before retrying. In the next cycle, it again assumes
> > > an update happened, and continues looping with the short sleep interval,
> > > causing the repeated logical decoding log messages. Based on a quick 
> > > analysis,
> > > this seems to be the root cause.
> > >
> > > I think update_local_synced_slot() should return false (i.e., no update
> > > happened) when confirmed_flush_lsn is equal but restart_lsn differs 
> > > between
> > > primary and standby.
> > >
> >
> > We expect that in such a case update_local_synced_slot() should
> > advance local_slot's 'restart_lsn' via
> > LogicalSlotAdvanceAndCheckSnapState(), otherwise, it won't go in the
> > cheap code path next time. Normally, restart_lsn advancement should
> > happen when we process XLOG_RUNNING_XACTS and call
> > SnapBuildProcessRunningXacts(). In this particular case as both
> > restart_lsn and confirmed_flush_lsn are the same (0/03000140), the
> > machinery may not be processing XLOG_RUNNING_XACTS record. I have not
> > debugged the exact case yet but you can try by emitting some more
> > records on publisher, it should let the standby advance the slot. It
> > is possible that we can do something like you are proposing to silence
> > the LOG messages but we should know what is going on here.
>
> I agree it's important to investigate why restart_lsn doesn't advance
> in that case and fix that issue.
>
> Separately, I think mistreating the slot as updated is another problem:
> the local slot is treated as updated even when it isn't, which causes
> the slotsync worker to use the minimum sleep interval. We should
> address this independently.
>
> LogicalSlotAdvanceAndCheckSnapState() doesn't always update the local slot,
> but update_local_synced_slot() currently assumes it does. That seems to be
> the root cause. The attached 0001 patch makes
> LogicalSlotAdvanceAndCheckSnapState() return whether the slot was actually
> updated, and updates update_local_synced_slot() to act based on that result.
>
> I've also attached 0002, which reduces the log level of some logical decoding
> messages (e.g., "logical decoding found consistent point") from LOG to
> DEBUG1. These are low-level, developer-oriented messages and can be noisy
> when they occur regularly. This is the same issue I reported earlier at [1].
>
> How about applying these patches while we continue working on
> the restart_lsn issue?
>

*
@@ -2174,7 +2193,10 @@ LogicalSlotAdvanceAndCheckSnapState(XLogRecPtr moveto,

  if (XLogRecPtrIsValid(ctx->reader->EndRecPtr))
  {
- LogicalConfirmReceivedLocation(moveto);
+ bool slot_updated = LogicalConfirmReceivedLocation(moveto);
+
+ if (updated_xmin_or_lsn)
+ *updated_xmin_or_lsn = slot_updated;

BTW, LogicalSlotAdvanceAndCheckSnapState() could also advance slot
while processing running_xact record, so not sure we can rely only on
the explicit call LogicalConfirmReceivedLocation() above to ascertain
the same.

Sorry, I couldn't get the chance to look at the patches proposed by
Hou-san to fix this issue but I'll look at it after the feature
freeze.

* For 0002, I think the current log_level is used since logical
decoding is introduced, so if we want to change those, we should start
a separate thread. BTW, as a developer, I find those LOG level
messages quite useful while debugging issues, so I am in favor of
keeping those as it is. However, if the consensus is otherwise, we can
change them as well.

-- 
With Regards,
Amit Kapila.


Reply via email to