Hi Jiaqi,

Thanks again for explaining the reasoning behind splitting the PAX PR.
Your concerns about "merge main" are well-taken — it introduces
non-linear history, complicates git bisect, and can lead to downstream
integration issues. It's clear that the decision to split was made
carefully under release pressure, and I appreciate the open dialogue
around this.

Looking forward, I’d like to propose a Git CLI-based workflow that can
help us avoid splitting large PRs in the future — even when commit
count exceeds GitHub’s "Rebase and Merge" limit in the UI.

This approach allows us to:
- Preserve full commit history (no squash)
- Avoid splitting logically complete work
- Maintain linear history for bisectability and readability
- Cleanly integrate with downstream forks if needed

Proposed Workflow:
------------------

    # 1. Rebase the feature branch onto the latest main
    git checkout feature/your-branch
    git fetch origin
    git rebase origin/main

    # 2. Push the rebased feature branch
    git push --force-with-lease

    # 3. After PR approval, ensure main is still current
    git fetch origin
    git checkout main
    git pull origin main

    # 4. If main has progressed, rebase the feature branch again
    git checkout feature/your-branch
    git rebase origin/main
    git push --force-with-lease

    # 5. Merge the rebased branch into main using CLI
    git checkout main
    git merge feature/your-branch --no-ff
    git push origin main

This process:
- Avoids GitHub UI merge limitations
- Keeps the commit graph clean and linear
- Ensures CI validation is accurate and relevant
- Preserves the full contribution context

Next Steps:
-----------

If this general approach makes sense, I’d be happy to help document it
in our committer or contributor guidelines. I’d also love to hear from
others — especially those maintaining downstream forks or submitting
larger features.

Thanks again Jiaqi for leading the PAX work and for raising the
trade-offs so thoughtfully. It’s through these conversations that we
build a better process together.

Best,
-=e


On Thu, Apr 10, 2025 at 7:51 AM jiaqi.zhou <jiaqi...@163.com> wrote:

> Hi all,
>
>
>
>
> My colleagues and I have internally discussed the option of using a "merge
> main" approach to bypass the "100+ commit rebase and merge problem".
>
>
>
>
> Why not "merge main"?
>
> - Non-linear History: Merging main would create a non-linear commit graph.
>
> - Impact on Git Bisect: This could complicate debugging workflows like git
> bisect.
>
> - Downstream Compatibility: Projects forked from CloudBerryDB with
> divergent codebases might face integration challenges.
>
>
>
>
> Why choose splited the PR?
>
> PAX had CI + code review internally since the project was launched, and
> every commit is complete (that is why we don’t choose squash). And after
> the split PRs are merged, the commits are linear.
>
>
>
>
> With the CBDB release approaching, please let us discuss this topic as
> soon as possible.
>
> Thanks
> Jiaqi
>
>
> 在 2025-04-10 22:01:09,"Ed Espino" <esp...@apache.org> 写道:
> >Hi all,
> >
> >I’d like to raise a contribution workflow concern we're currently
> >encountering in Apache Cloudberry (Incubating), and propose that we
> >establish a preferred approach for handling similar situations going
> >forward.
> >
> >Contributor *@jiaqizho* submitted a significant pull request:
> >*#1002 – Feature: introduce a high-performance hybrid row-columnar storage
> >engine <https://github.com/apache/cloudberry/pull/1002>*
> >
> >The PR contains *300+ commits* and has successfully passed CI. However,
> due
> >to the number of commits, GitHub's *“Rebase and Merge”* option is disabled
> >— a known limitation when the PR size exceeds certain internal thresholds.
> >As a result, the PR cannot be merged via the web UI, even by committers
> >with full permissions.
> >
> >In response, the contributor has now *split the PR into four smaller PRs*
> >in an attempt to work around the UI limitation and proceed with merging.
> >------------------------------
> >Why This May Not Be Ideal
> >
> >While the effort is appreciated, splitting the PR introduces several
> >drawbacks:
> >
> >   -
> >
> >   *Review context becomes fragmented* across multiple PRs.
> >   -
> >
> >   *Merge complexity increases*, especially when changes are
> interdependent.
> >   -
> >
> >   *Contributor and reviewer effort multiplies*, with more overhead and
> >   duplicated CI runs.
> >   -
> >
> >   *It sends a mixed message* to future contributors that PR splitting is
> >   preferred in these cases — which isn’t necessarily true.
> >
> >------------------------------
> >What Other ASF Projects Do
> >
> >Several other Apache projects handle large PRs by relying on *Git
> CLI-based
> >merges*, rather than splitting:
> >
> >   -
> >
> >   *Apache Arrow*: Encourages local rebases and merges for large
> >   contributions.
> >   -
> >
> >   *Apache Spark*: Merges and squashes are typically done via CLI;
> >   splitting is discouraged unless changes are logically separable.
> >   -
> >
> >   *Apache Kafka*: Maintainers use merge scripts
> >   <
> https://cwiki.apache.org/confluence/display/KAFKA/Pull+Request+Workflow>
> >   to handle large PRs manually.
> >   -
> >
> >   *Apache Flink* and *Apache Beam*: Default to local CLI workflows to
> >   maintain history and bypass UI restrictions.
> >
> >This keeps reviews cohesive and simplifies the overall process for
> >contributors and committers alike.
> >------------------------------
> >✅ Recommended Best Practice for Apache Cloudberry
> >
> >To align with ASF norms and improve maintainability, I propose:
> >
> >   1.
> >
> >   *Using Git CLI-based merges* as the standard method for large PRs
> (e.g.,
> >   100+ commits or more).
> >   2.
> >
> >   *Discouraging contributors from splitting PRs* to work around UI
> >   limitations, unless explicitly requested by reviewers for clarity or
> >   modularity.
> >   3.
> >
> >   *Documenting this workflow* in our committer guidelines to ensure
> >   consistency.
> >
> >------------------------------
> > Verified CLI Merge Workflow for Large PRs
> >
> ># 1. Fetch the PR branch directly from GitHub
> >git fetch origin pull/1002/head:pax-merge
> >
> ># 2. Optionally rebase for a linear history
> >git checkout pax-merge
> >git rebase origin/main
> >
> ># 3. Merge into main
> >git checkout main
> >git pull origin main
> >git merge pax-merge --no-ff
> >
> ># 4. Push the result to the repository
> >git push origin main
> >
> ># (Optional) Clean up
> >git branch -d pax-merge
> >
> >This approach avoids GitHub’s UI merge limitations, preserves commit
> >history, and maintains a better experience for both contributors and
> >reviewers.
> >------------------------------
> >
> >Would love to hear thoughts from the community. If there's agreement, we
> >should add contributing and committer workflows to our newly enabled wiki.
> >
> >Best regards,
> >-=e
> >Ed Espino
> >Apache Cloudberry (Incubating) & MADlib
>


-- 
Ed Espino
Apache Cloudberry (Incubating) & MADlib

Reply via email to