On Tue, 10 Mar 2026 at 06:09, Madhav Madhusoodanan
<[email protected]> wrote:
>
> On Fri, Mar 6, 2026 at 2:15 AM Madhav Madhusoodanan
> <[email protected]> wrote:
> >
> > Some parts of the merge flow that I am coming up with are as follows
> > (assuming tuples from index page B are migrated rightwards into C):
> >
> > 1. Leaving B's tuples as it is even after merge, to remove the
> > possible risk of scans "skipping over" tuples. Essentially, the tuples
> > then would be "copied" into C.
> > 2. Marking pages B and C with flags similar to INCOMPLETE_SPLIT (say,
> > MERGE_SRC and MERGE_DEST respectively) before the actual merge
> > process, then marking the pages with another flag upon completion
> > (MERGE_COMPLETE) so that other processes can handle transient merge
> > states.
> > 3. For example, scans that reach page B post-merge (MERGE_SRC +
> > MERGE_COMPLETE) would be made to skip to the page to its right.
> > 4. Updating VACUUM to handle post-merge cleanup (to remove pages such as B).
> >
>
> I was going through the source code to understand whether the
> aforementioned direction of changes would be reasonable.
>
> I was observing `BTPageOpaqueData.btpo_flags` [0] which is a uint16,
> but only 9 bits are used.
>
> Would using a couple bits of the same for this purpose be reasonable?
> Or are they being reserved for future functionality?

They're exclusively for btree code's use; extensions (*) must not add
to (or change the meaning of) those bits, lest they create a forward
incompatibility with core PostgreSQL btree code in newer major
versions; it would cause corrupted binary upgraded databases.
But patches against core btree code can use those bits, because
forward compatibility is less of an issue there - we don't really
support binary upgrades manually patched systems, especially if they
have incompatible on-disk data.

(*): I'm skeptical about whether you could make btree scans handle
concurrently merged pages, when that merging is implemented as
extension and the btree code doesn't know about merges.

Kind regards,

Matthias van de Meent
Databricks (https://www.databricks.com)


Reply via email to