ForeverAngry opened a new pull request, #2434:
URL: https://github.com/apache/iceberg-python/pull/2434
<!--
Thanks for opening a pull request!
-->
<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- No specific GitHub issue - this is a new feature enhancement -->
# Rationale for this change
This PR adds comprehensive branch merge strategies to PyIceberg, bringing
Git-like branch merging capabilities to Iceberg table operations. This
enhancement enables users to merge branches with different strategies depending
on their workflow needs.
**Feature Overview:**
Apache Iceberg supports branch operations (create, delete, tag), but lacked
merge capabilities between branches. This PR implements 5 standard merge
strategies commonly used in version control systems:
1. **MERGE**: Classic three-way merge creating a merge commit that preserves
history of both branches
2. **SQUASH**: Condenses all commits from source branch into a single clean
commit on target branch
3. **REBASE**: Creates linear history by replaying commits from source
branch on top of target branch
4. **CHERRY_PICK**: Selects and applies specific individual commits from one
branch to another
5. **FAST_FORWARD**: Moves target branch pointer forward when no divergent
commits exist (no merge commit needed)
**Implementation Details:**
- **Strategy Pattern**: Clean, extensible architecture with abstract base
class and concrete implementations
- **Automatic Detection**: Fast-forward possibility automatically detected
and validated
- **Robust Utilities**: Common ancestor finding, branch validation, and
snapshot traversal utilities
- **Flexible API**: Optional source branch deletion after successful merge
- **Error Handling**: Comprehensive validation with clear error messages for
invalid operations
**Use Cases:**
- **Development Workflows**: Feature branch integration with different merge
policies
- **Data Pipeline Management**: Merging experimental data processing
branches back to production
- **Schema Evolution**: Combining schema changes from different development
branches
- **Multi-tenant Environments**: Merging tenant-specific changes while
maintaining isolation
## Are these changes tested?
Yes, extremely comprehensive test coverage with **35 tests** across multiple
categories:
**Core Functionality Tests:**
- **Strategy Implementation**: All 5 merge strategies individually tested
with various scenarios
- **Utility Methods**: Common ancestor finding, fast-forward detection,
branch validation
- **Integration Tests**: End-to-end testing through ManageSnapshots API
**Edge Cases & Error Handling:**
- Missing snapshots and branches with proper error messages
- Circular reference detection (prevents infinite loops)
- Self-merge prevention and validation
- Empty/invalid branch names
- No common ancestor scenarios
**Behavioral Validation:**
- Fast-forward validation specific to FastForwardStrategy
- Strategy selection and factory method testing
- Source branch preservation vs. deletion options
- Consistent return types across all strategies
**Quality Assurance:**
- All existing functionality preserved (no regressions)
- Comprehensive type hints and documentation
- All linting checks pass (ruff, mypy, pydocstyle)
- Mock-based testing for isolation and reliability
## Are there any user-facing changes?
**Yes - New Feature Addition (No Breaking Changes)**
**New Public API:**
```python
from pyiceberg.table.update.snapshot import BranchMergeStrategy
# New enum with 5 merge strategies
BranchMergeStrategy.MERGE
BranchMergeStrategy.SQUASH
BranchMergeStrategy.REBASE
BranchMergeStrategy.CHERRY_PICK
BranchMergeStrategy.FAST_FORWARD
# New method on ManageSnapshots
table.manage_snapshots().merge_branch(
source_branch="feature",
target_branch="main",
strategy=BranchMergeStrategy.SQUASH,
delete_source_branch=False # Optional: preserve or delete source branch
).commit()
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]