Hi Jiaqi, Thanks again for explaining the reasoning behind splitting the PAX PR. Your concerns about "merge main" are well-taken — it introduces non-linear history, complicates git bisect, and can lead to downstream integration issues. It's clear that the decision to split was made carefully under release pressure, and I appreciate the open dialogue around this.
Looking forward, I’d like to propose a Git CLI-based workflow that can help us avoid splitting large PRs in the future — even when commit count exceeds GitHub’s "Rebase and Merge" limit in the UI. This approach allows us to: - Preserve full commit history (no squash) - Avoid splitting logically complete work - Maintain linear history for bisectability and readability - Cleanly integrate with downstream forks if needed Proposed Workflow: ------------------ # 1. Rebase the feature branch onto the latest main git checkout feature/your-branch git fetch origin git rebase origin/main # 2. Push the rebased feature branch git push --force-with-lease # 3. After PR approval, ensure main is still current git fetch origin git checkout main git pull origin main # 4. If main has progressed, rebase the feature branch again git checkout feature/your-branch git rebase origin/main git push --force-with-lease # 5. Merge the rebased branch into main using CLI git checkout main git merge feature/your-branch --no-ff git push origin main This process: - Avoids GitHub UI merge limitations - Keeps the commit graph clean and linear - Ensures CI validation is accurate and relevant - Preserves the full contribution context Next Steps: ----------- If this general approach makes sense, I’d be happy to help document it in our committer or contributor guidelines. I’d also love to hear from others — especially those maintaining downstream forks or submitting larger features. Thanks again Jiaqi for leading the PAX work and for raising the trade-offs so thoughtfully. It’s through these conversations that we build a better process together. Best, -=e On Thu, Apr 10, 2025 at 7:51 AM jiaqi.zhou <jiaqi...@163.com> wrote: > Hi all, > > > > > My colleagues and I have internally discussed the option of using a "merge > main" approach to bypass the "100+ commit rebase and merge problem". > > > > > Why not "merge main"? > > - Non-linear History: Merging main would create a non-linear commit graph. > > - Impact on Git Bisect: This could complicate debugging workflows like git > bisect. > > - Downstream Compatibility: Projects forked from CloudBerryDB with > divergent codebases might face integration challenges. > > > > > Why choose splited the PR? > > PAX had CI + code review internally since the project was launched, and > every commit is complete (that is why we don’t choose squash). And after > the split PRs are merged, the commits are linear. > > > > > With the CBDB release approaching, please let us discuss this topic as > soon as possible. > > Thanks > Jiaqi > > > 在 2025-04-10 22:01:09,"Ed Espino" <esp...@apache.org> 写道: > >Hi all, > > > >I’d like to raise a contribution workflow concern we're currently > >encountering in Apache Cloudberry (Incubating), and propose that we > >establish a preferred approach for handling similar situations going > >forward. > > > >Contributor *@jiaqizho* submitted a significant pull request: > >*#1002 – Feature: introduce a high-performance hybrid row-columnar storage > >engine <https://github.com/apache/cloudberry/pull/1002>* > > > >The PR contains *300+ commits* and has successfully passed CI. However, > due > >to the number of commits, GitHub's *“Rebase and Merge”* option is disabled > >— a known limitation when the PR size exceeds certain internal thresholds. > >As a result, the PR cannot be merged via the web UI, even by committers > >with full permissions. > > > >In response, the contributor has now *split the PR into four smaller PRs* > >in an attempt to work around the UI limitation and proceed with merging. > >------------------------------ > >Why This May Not Be Ideal > > > >While the effort is appreciated, splitting the PR introduces several > >drawbacks: > > > > - > > > > *Review context becomes fragmented* across multiple PRs. > > - > > > > *Merge complexity increases*, especially when changes are > interdependent. > > - > > > > *Contributor and reviewer effort multiplies*, with more overhead and > > duplicated CI runs. > > - > > > > *It sends a mixed message* to future contributors that PR splitting is > > preferred in these cases — which isn’t necessarily true. > > > >------------------------------ > >What Other ASF Projects Do > > > >Several other Apache projects handle large PRs by relying on *Git > CLI-based > >merges*, rather than splitting: > > > > - > > > > *Apache Arrow*: Encourages local rebases and merges for large > > contributions. > > - > > > > *Apache Spark*: Merges and squashes are typically done via CLI; > > splitting is discouraged unless changes are logically separable. > > - > > > > *Apache Kafka*: Maintainers use merge scripts > > < > https://cwiki.apache.org/confluence/display/KAFKA/Pull+Request+Workflow> > > to handle large PRs manually. > > - > > > > *Apache Flink* and *Apache Beam*: Default to local CLI workflows to > > maintain history and bypass UI restrictions. > > > >This keeps reviews cohesive and simplifies the overall process for > >contributors and committers alike. > >------------------------------ > >✅ Recommended Best Practice for Apache Cloudberry > > > >To align with ASF norms and improve maintainability, I propose: > > > > 1. > > > > *Using Git CLI-based merges* as the standard method for large PRs > (e.g., > > 100+ commits or more). > > 2. > > > > *Discouraging contributors from splitting PRs* to work around UI > > limitations, unless explicitly requested by reviewers for clarity or > > modularity. > > 3. > > > > *Documenting this workflow* in our committer guidelines to ensure > > consistency. > > > >------------------------------ > > Verified CLI Merge Workflow for Large PRs > > > ># 1. Fetch the PR branch directly from GitHub > >git fetch origin pull/1002/head:pax-merge > > > ># 2. Optionally rebase for a linear history > >git checkout pax-merge > >git rebase origin/main > > > ># 3. Merge into main > >git checkout main > >git pull origin main > >git merge pax-merge --no-ff > > > ># 4. Push the result to the repository > >git push origin main > > > ># (Optional) Clean up > >git branch -d pax-merge > > > >This approach avoids GitHub’s UI merge limitations, preserves commit > >history, and maintains a better experience for both contributors and > >reviewers. > >------------------------------ > > > >Would love to hear thoughts from the community. If there's agreement, we > >should add contributing and committer workflows to our newly enabled wiki. > > > >Best regards, > >-=e > >Ed Espino > >Apache Cloudberry (Incubating) & MADlib > -- Ed Espino Apache Cloudberry (Incubating) & MADlib