Hi all,

I’d like to raise a contribution workflow concern we're currently
encountering in Apache Cloudberry (Incubating), and propose that we
establish a preferred approach for handling similar situations going
forward.

Contributor *@jiaqizho* submitted a significant pull request:
*#1002 – Feature: introduce a high-performance hybrid row-columnar storage
engine <https://github.com/apache/cloudberry/pull/1002>*

The PR contains *300+ commits* and has successfully passed CI. However, due
to the number of commits, GitHub's *“Rebase and Merge”* option is disabled
— a known limitation when the PR size exceeds certain internal thresholds.
As a result, the PR cannot be merged via the web UI, even by committers
with full permissions.

In response, the contributor has now *split the PR into four smaller PRs*
in an attempt to work around the UI limitation and proceed with merging.
------------------------------
Why This May Not Be Ideal

While the effort is appreciated, splitting the PR introduces several
drawbacks:

   -

   *Review context becomes fragmented* across multiple PRs.
   -

   *Merge complexity increases*, especially when changes are interdependent.
   -

   *Contributor and reviewer effort multiplies*, with more overhead and
   duplicated CI runs.
   -

   *It sends a mixed message* to future contributors that PR splitting is
   preferred in these cases — which isn’t necessarily true.

------------------------------
What Other ASF Projects Do

Several other Apache projects handle large PRs by relying on *Git CLI-based
merges*, rather than splitting:

   -

   *Apache Arrow*: Encourages local rebases and merges for large
   contributions.
   -

   *Apache Spark*: Merges and squashes are typically done via CLI;
   splitting is discouraged unless changes are logically separable.
   -

   *Apache Kafka*: Maintainers use merge scripts
   <https://cwiki.apache.org/confluence/display/KAFKA/Pull+Request+Workflow>
   to handle large PRs manually.
   -

   *Apache Flink* and *Apache Beam*: Default to local CLI workflows to
   maintain history and bypass UI restrictions.

This keeps reviews cohesive and simplifies the overall process for
contributors and committers alike.
------------------------------
✅ Recommended Best Practice for Apache Cloudberry

To align with ASF norms and improve maintainability, I propose:

   1.

   *Using Git CLI-based merges* as the standard method for large PRs (e.g.,
   100+ commits or more).
   2.

   *Discouraging contributors from splitting PRs* to work around UI
   limitations, unless explicitly requested by reviewers for clarity or
   modularity.
   3.

   *Documenting this workflow* in our committer guidelines to ensure
   consistency.

------------------------------
🔧 Verified CLI Merge Workflow for Large PRs

# 1. Fetch the PR branch directly from GitHub
git fetch origin pull/1002/head:pax-merge

# 2. Optionally rebase for a linear history
git checkout pax-merge
git rebase origin/main

# 3. Merge into main
git checkout main
git pull origin main
git merge pax-merge --no-ff

# 4. Push the result to the repository
git push origin main

# (Optional) Clean up
git branch -d pax-merge

This approach avoids GitHub’s UI merge limitations, preserves commit
history, and maintains a better experience for both contributors and
reviewers.
------------------------------

Would love to hear thoughts from the community. If there's agreement, we
should add contributing and committer workflows to our newly enabled wiki.

Best regards,
-=e
Ed Espino
Apache Cloudberry (Incubating) & MADlib

Reply via email to