bk-mz opened a new issue, #13957:
URL: https://github.com/apache/iceberg/issues/13957
### Feature Request / Improvement
## Problem Statement
When executing MERGE operations that affect a large number of partitions,
Iceberg currently processes the entire operation atomically as a single logical
operation.
This means all affected partitions are included in one execution plan,
requiring Spark to shuffle data across all partitions simultaneously.
The planner cannot isolate partition-specific operations, even though
partitions are physically independent storage units.
This creates excessive shuffle overhead in Spark that scales poorly with the
number of affected partitions, even when data volume is modest.
We've observed that this shuffle dominates processing time, and reducing
batch size doesn't help if the partition count remains high.
## Proposed Enhancement
Enhance Iceberg to natively support partition-aware parallelism for MERGE
operations:
1. Add capability to automatically split MERGE operations by partition
boundaries.
2. Leverage partition independence to reduce shuffle scope per operation.
3. Process partition groups concurrently while maintaining proper commit
semantics.
## Impact
This enhancement would:
- Improve performance for workloads with high partition counts.
- Reduce resource consumption for merge operations.
- Better handle late-arriving data affecting historical partitions.
- Eliminate need for application-level workarounds.
## Current Workaround
Workaround would be an application-level solution that:
1. Splits batch into partition-specific chunks.
2. Processes chunks in parallel with controlled concurrency.
- Each chunk contains a small subset of partitions to update
- Each chunk runs a separate MERGE operation on its partition set
- Multiple chunks execute concurrently up to a configured limit
4. Reduces shuffle cost per operation by limiting partition scope.
### Downsides of Workaround
1. **Increased Complexity**: Requires custom application logic to handle
chunking, concurrency control, and error handling.
2. **Metadata Bloat**: Multiple small commits generate more snapshots,
requiring more aggressive compaction and cleanup.
3. **Consistency Challenges**: Application must ensure idempotent processing
across job restarts when chunks are partially processed.
4. **Parameter Tuning**: Requires careful configuration of commit retry
parameters to handle increased concurrency.
5. **Limited Optimization**: Application has less visibility into underlying
storage than Iceberg itself would.
6. **Resource Contention**: Concurrent operations can lead to resource
contention at the metadata service level.
---
Iceberg version 1.9.1.
Spark Version 3.5.5.
### Query engine
Spark
### Willingness to contribute
- [ ] I can contribute this improvement/feature independently
- [ ] I would be willing to contribute this improvement/feature with
guidance from the Iceberg community
- [x] I cannot contribute this improvement/feature at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]