Hi ed,
I think you didn't get my point. So let me show u if we follow the workflow: 1. git checkout main 2. git checkout -b test-commit200 3. (branch test-commit200) sh 200commit.sh ``` 200commit.sh for i in {1..200}; do echo "Commit $i" >> test-file.txt git add test-file.txt git commit -m "Commit $i" done ``` 4. git push origin test-commit200 5. git checkout main 6. (branch main) git merge origin/test-commit200 --no-ff 7. (branch main) git log ``` commit 49880256dce3c17789d31b3d173686d579e99e2e (HEAD -> main) Merge: 49d49b87eee 3a48c1a66a7 Author: zhoujiaqi <zhouji...@hashdata.cn> Date: Fri Apr 11 17:54:36 2025 +0800 Merge remote-tracking branch 'origin/test-commit200' into main commit 3a48c1a66a7931f50e0200daf2b5fb5e69df9ec9 (origin/test-commit200, test-commit200) ``` The commit "Merge any branch into main " is not a clean and linear commit message. Also i don't think that is a good idea that we direct push into the main. I can accept two ways to merge into main: 1. cherrypick + push main(push by admin) - git checkout main - git cherry-pick {PAX first commit}^...{PAX last commit} - git push origin main // This step should be done by a special person, not me (I am just a developer) 2. splited PR - As I am doing now, the reasons has already been outlined in my earlier emails, and I prefer this approach. Thanks Jiaqi At 2025-04-11 00:11:18, "Ed Espino" <esp...@apache.org> wrote: >Hi Jiaqi, > >Thanks again for explaining the reasoning behind splitting the PAX PR. >Your concerns about "merge main" are well-taken — it introduces >non-linear history, complicates git bisect, and can lead to downstream >integration issues. It's clear that the decision to split was made >carefully under release pressure, and I appreciate the open dialogue >around this. > >Looking forward, I’d like to propose a Git CLI-based workflow that can >help us avoid splitting large PRs in the future — even when commit >count exceeds GitHub’s "Rebase and Merge" limit in the UI. > >This approach allows us to: >- Preserve full commit history (no squash) >- Avoid splitting logically complete work >- Maintain linear history for bisectability and readability >- Cleanly integrate with downstream forks if needed > >Proposed Workflow: >------------------ > > # 1. Rebase the feature branch onto the latest main > git checkout feature/your-branch > git fetch origin > git rebase origin/main > > # 2. Push the rebased feature branch > git push --force-with-lease > > # 3. After PR approval, ensure main is still current > git fetch origin > git checkout main > git pull origin main > > # 4. If main has progressed, rebase the feature branch again > git checkout feature/your-branch > git rebase origin/main > git push --force-with-lease > > # 5. Merge the rebased branch into main using CLI > git checkout main > git merge feature/your-branch --no-ff > git push origin main > >This process: >- Avoids GitHub UI merge limitations >- Keeps the commit graph clean and linear >- Ensures CI validation is accurate and relevant >- Preserves the full contribution context > >Next Steps: >----------- > >If this general approach makes sense, I’d be happy to help document it >in our committer or contributor guidelines. I’d also love to hear from >others — especially those maintaining downstream forks or submitting >larger features. > >Thanks again Jiaqi for leading the PAX work and for raising the >trade-offs so thoughtfully. It’s through these conversations that we >build a better process together. > >Best, >-=e > > >On Thu, Apr 10, 2025 at 7:51 AM jiaqi.zhou <jiaqi...@163.com> wrote: > >> Hi all, >> >> >> >> >> My colleagues and I have internally discussed the option of using a "merge >> main" approach to bypass the "100+ commit rebase and merge problem". >> >> >> >> >> Why not "merge main"? >> >> - Non-linear History: Merging main would create a non-linear commit graph. >> >> - Impact on Git Bisect: This could complicate debugging workflows like git >> bisect. >> >> - Downstream Compatibility: Projects forked from CloudBerryDB with >> divergent codebases might face integration challenges. >> >> >> >> >> Why choose splited the PR? >> >> PAX had CI + code review internally since the project was launched, and >> every commit is complete (that is why we don’t choose squash). And after >> the split PRs are merged, the commits are linear. >> >> >> >> >> With the CBDB release approaching, please let us discuss this topic as >> soon as possible. >> >> Thanks >> Jiaqi >> >> >> 在 2025-04-10 22:01:09,"Ed Espino" <esp...@apache.org> 写道: >> >Hi all, >> > >> >I’d like to raise a contribution workflow concern we're currently >> >encountering in Apache Cloudberry (Incubating), and propose that we >> >establish a preferred approach for handling similar situations going >> >forward. >> > >> >Contributor *@jiaqizho* submitted a significant pull request: >> >*#1002 – Feature: introduce a high-performance hybrid row-columnar storage >> >engine <https://github.com/apache/cloudberry/pull/1002>* >> > >> >The PR contains *300+ commits* and has successfully passed CI. However, >> due >> >to the number of commits, GitHub's *“Rebase and Merge”* option is disabled >> >— a known limitation when the PR size exceeds certain internal thresholds. >> >As a result, the PR cannot be merged via the web UI, even by committers >> >with full permissions. >> > >> >In response, the contributor has now *split the PR into four smaller PRs* >> >in an attempt to work around the UI limitation and proceed with merging. >> >------------------------------ >> >Why This May Not Be Ideal >> > >> >While the effort is appreciated, splitting the PR introduces several >> >drawbacks: >> > >> > - >> > >> > *Review context becomes fragmented* across multiple PRs. >> > - >> > >> > *Merge complexity increases*, especially when changes are >> interdependent. >> > - >> > >> > *Contributor and reviewer effort multiplies*, with more overhead and >> > duplicated CI runs. >> > - >> > >> > *It sends a mixed message* to future contributors that PR splitting is >> > preferred in these cases — which isn’t necessarily true. >> > >> >------------------------------ >> >What Other ASF Projects Do >> > >> >Several other Apache projects handle large PRs by relying on *Git >> CLI-based >> >merges*, rather than splitting: >> > >> > - >> > >> > *Apache Arrow*: Encourages local rebases and merges for large >> > contributions. >> > - >> > >> > *Apache Spark*: Merges and squashes are typically done via CLI; >> > splitting is discouraged unless changes are logically separable. >> > - >> > >> > *Apache Kafka*: Maintainers use merge scripts >> > < >> https://cwiki.apache.org/confluence/display/KAFKA/Pull+Request+Workflow> >> > to handle large PRs manually. >> > - >> > >> > *Apache Flink* and *Apache Beam*: Default to local CLI workflows to >> > maintain history and bypass UI restrictions. >> > >> >This keeps reviews cohesive and simplifies the overall process for >> >contributors and committers alike. >> >------------------------------ >> >✅ Recommended Best Practice for Apache Cloudberry >> > >> >To align with ASF norms and improve maintainability, I propose: >> > >> > 1. >> > >> > *Using Git CLI-based merges* as the standard method for large PRs >> (e.g., >> > 100+ commits or more). >> > 2. >> > >> > *Discouraging contributors from splitting PRs* to work around UI >> > limitations, unless explicitly requested by reviewers for clarity or >> > modularity. >> > 3. >> > >> > *Documenting this workflow* in our committer guidelines to ensure >> > consistency. >> > >> >------------------------------ >> > Verified CLI Merge Workflow for Large PRs >> > >> ># 1. Fetch the PR branch directly from GitHub >> >git fetch origin pull/1002/head:pax-merge >> > >> ># 2. Optionally rebase for a linear history >> >git checkout pax-merge >> >git rebase origin/main >> > >> ># 3. Merge into main >> >git checkout main >> >git pull origin main >> >git merge pax-merge --no-ff >> > >> ># 4. Push the result to the repository >> >git push origin main >> > >> ># (Optional) Clean up >> >git branch -d pax-merge >> > >> >This approach avoids GitHub’s UI merge limitations, preserves commit >> >history, and maintains a better experience for both contributors and >> >reviewers. >> >------------------------------ >> > >> >Would love to hear thoughts from the community. If there's agreement, we >> >should add contributing and committer workflows to our newly enabled wiki. >> > >> >Best regards, >> >-=e >> >Ed Espino >> >Apache Cloudberry (Incubating) & MADlib >> > > >-- >Ed Espino >Apache Cloudberry (Incubating) & MADlib