Hi all, 



My colleagues and I have internally discussed the option of using a "merge 
main" approach to bypass the "100+ commit rebase and merge problem".




Why not "merge main"?

- Non-linear History: Merging main would create a non-linear commit graph.

- Impact on Git Bisect: This could complicate debugging workflows like git 
bisect.

- Downstream Compatibility: Projects forked from CloudBerryDB with divergent 
codebases might face integration challenges.




Why choose splited the PR?

PAX had CI + code review internally since the project was launched, and every 
commit is complete (that is why we don’t choose squash). And after the split 
PRs are merged, the commits are linear.




With the CBDB release approaching, please let us discuss this topic as soon as 
possible.

Thanks
Jiaqi


在 2025-04-10 22:01:09,"Ed Espino" <esp...@apache.org> 写道:
>Hi all,
>
>I’d like to raise a contribution workflow concern we're currently
>encountering in Apache Cloudberry (Incubating), and propose that we
>establish a preferred approach for handling similar situations going
>forward.
>
>Contributor *@jiaqizho* submitted a significant pull request:
>*#1002 – Feature: introduce a high-performance hybrid row-columnar storage
>engine <https://github.com/apache/cloudberry/pull/1002>*
>
>The PR contains *300+ commits* and has successfully passed CI. However, due
>to the number of commits, GitHub's *“Rebase and Merge”* option is disabled
>— a known limitation when the PR size exceeds certain internal thresholds.
>As a result, the PR cannot be merged via the web UI, even by committers
>with full permissions.
>
>In response, the contributor has now *split the PR into four smaller PRs*
>in an attempt to work around the UI limitation and proceed with merging.
>------------------------------
>Why This May Not Be Ideal
>
>While the effort is appreciated, splitting the PR introduces several
>drawbacks:
>
>   -
>
>   *Review context becomes fragmented* across multiple PRs.
>   -
>
>   *Merge complexity increases*, especially when changes are interdependent.
>   -
>
>   *Contributor and reviewer effort multiplies*, with more overhead and
>   duplicated CI runs.
>   -
>
>   *It sends a mixed message* to future contributors that PR splitting is
>   preferred in these cases — which isn’t necessarily true.
>
>------------------------------
>What Other ASF Projects Do
>
>Several other Apache projects handle large PRs by relying on *Git CLI-based
>merges*, rather than splitting:
>
>   -
>
>   *Apache Arrow*: Encourages local rebases and merges for large
>   contributions.
>   -
>
>   *Apache Spark*: Merges and squashes are typically done via CLI;
>   splitting is discouraged unless changes are logically separable.
>   -
>
>   *Apache Kafka*: Maintainers use merge scripts
>   <https://cwiki.apache.org/confluence/display/KAFKA/Pull+Request+Workflow>
>   to handle large PRs manually.
>   -
>
>   *Apache Flink* and *Apache Beam*: Default to local CLI workflows to
>   maintain history and bypass UI restrictions.
>
>This keeps reviews cohesive and simplifies the overall process for
>contributors and committers alike.
>------------------------------
>✅ Recommended Best Practice for Apache Cloudberry
>
>To align with ASF norms and improve maintainability, I propose:
>
>   1.
>
>   *Using Git CLI-based merges* as the standard method for large PRs (e.g.,
>   100+ commits or more).
>   2.
>
>   *Discouraging contributors from splitting PRs* to work around UI
>   limitations, unless explicitly requested by reviewers for clarity or
>   modularity.
>   3.
>
>   *Documenting this workflow* in our committer guidelines to ensure
>   consistency.
>
>------------------------------
> Verified CLI Merge Workflow for Large PRs
>
># 1. Fetch the PR branch directly from GitHub
>git fetch origin pull/1002/head:pax-merge
>
># 2. Optionally rebase for a linear history
>git checkout pax-merge
>git rebase origin/main
>
># 3. Merge into main
>git checkout main
>git pull origin main
>git merge pax-merge --no-ff
>
># 4. Push the result to the repository
>git push origin main
>
># (Optional) Clean up
>git branch -d pax-merge
>
>This approach avoids GitHub’s UI merge limitations, preserves commit
>history, and maintains a better experience for both contributors and
>reviewers.
>------------------------------
>
>Would love to hear thoughts from the community. If there's agreement, we
>should add contributing and committer workflows to our newly enabled wiki.
>
>Best regards,
>-=e
>Ed Espino
>Apache Cloudberry (Incubating) & MADlib

Reply via email to