Hi all, I’d like to raise a contribution workflow concern we're currently encountering in Apache Cloudberry (Incubating), and propose that we establish a preferred approach for handling similar situations going forward.
Contributor *@jiaqizho* submitted a significant pull request: *#1002 – Feature: introduce a high-performance hybrid row-columnar storage engine <https://github.com/apache/cloudberry/pull/1002>* The PR contains *300+ commits* and has successfully passed CI. However, due to the number of commits, GitHub's *“Rebase and Merge”* option is disabled — a known limitation when the PR size exceeds certain internal thresholds. As a result, the PR cannot be merged via the web UI, even by committers with full permissions. In response, the contributor has now *split the PR into four smaller PRs* in an attempt to work around the UI limitation and proceed with merging. ------------------------------ Why This May Not Be Ideal While the effort is appreciated, splitting the PR introduces several drawbacks: - *Review context becomes fragmented* across multiple PRs. - *Merge complexity increases*, especially when changes are interdependent. - *Contributor and reviewer effort multiplies*, with more overhead and duplicated CI runs. - *It sends a mixed message* to future contributors that PR splitting is preferred in these cases — which isn’t necessarily true. ------------------------------ What Other ASF Projects Do Several other Apache projects handle large PRs by relying on *Git CLI-based merges*, rather than splitting: - *Apache Arrow*: Encourages local rebases and merges for large contributions. - *Apache Spark*: Merges and squashes are typically done via CLI; splitting is discouraged unless changes are logically separable. - *Apache Kafka*: Maintainers use merge scripts <https://cwiki.apache.org/confluence/display/KAFKA/Pull+Request+Workflow> to handle large PRs manually. - *Apache Flink* and *Apache Beam*: Default to local CLI workflows to maintain history and bypass UI restrictions. This keeps reviews cohesive and simplifies the overall process for contributors and committers alike. ------------------------------ ✅ Recommended Best Practice for Apache Cloudberry To align with ASF norms and improve maintainability, I propose: 1. *Using Git CLI-based merges* as the standard method for large PRs (e.g., 100+ commits or more). 2. *Discouraging contributors from splitting PRs* to work around UI limitations, unless explicitly requested by reviewers for clarity or modularity. 3. *Documenting this workflow* in our committer guidelines to ensure consistency. ------------------------------ 🔧 Verified CLI Merge Workflow for Large PRs # 1. Fetch the PR branch directly from GitHub git fetch origin pull/1002/head:pax-merge # 2. Optionally rebase for a linear history git checkout pax-merge git rebase origin/main # 3. Merge into main git checkout main git pull origin main git merge pax-merge --no-ff # 4. Push the result to the repository git push origin main # (Optional) Clean up git branch -d pax-merge This approach avoids GitHub’s UI merge limitations, preserves commit history, and maintains a better experience for both contributors and reviewers. ------------------------------ Would love to hear thoughts from the community. If there's agreement, we should add contributing and committer workflows to our newly enabled wiki. Best regards, -=e Ed Espino Apache Cloudberry (Incubating) & MADlib