suxiaogang223 opened a new pull request, #62006:
URL: https://github.com/apache/doris/pull/62006
### What problem does this PR solve?
**Problem Summary:**
For large Iceberg tables with thousands of manifest files, FE-local manifest
scanning becomes
a bottleneck during query planning. All manifest I/O runs on the FE,
limiting planning
throughput regardless of how many BE nodes are available.
This PR introduces **distributed metadata planning**: FE partitions the
matching manifests
across available BE nodes, each BE scans its assigned manifests in parallel,
and FE
reconstructs the scan tasks from the results. This allows planning to scale
with cluster size
for large tables.
The planning path is controlled by the session variable
`iceberg_metadata_planning_mode`:
- `local` (default) — FE-local planning, existing behavior unchanged
- `distributed` — always use distributed planning
- `auto` — automatically choose based on table size and cluster capacity
In `auto` mode, Doris collects lightweight signals from manifest-list
metadata (no manifest
file I/O) and applies a threshold-based algorithm to decide whether
distributed planning
provides a meaningful parallelism advantage over the local path.
### Release note
Support distributed Iceberg metadata planning via session variable
`iceberg_metadata_planning_mode` (values: `local` / `distributed` / `auto`,
default: `local`).
In `auto` mode, Doris automatically offloads manifest scanning to BE nodes
when the table is
large enough to benefit from distributed parallelism.
### Type of change
- [ ] Bug fix
- [x] New feature
- [ ] Breaking change
- [ ] Documentation update
### Check List (For Author)
- Test
- [ ] Regression test
- [ ] Unit Test
- [x] Manual test
- [ ] No need to test or manual test. Explain why:
- Behavior changed:
- [ ] No.
- [x] Yes.
New session variable `iceberg_metadata_planning_mode` (default
`local`) controls whether
manifest scanning is offloaded to BE. Existing behavior is preserved
by default.
- Does this need documentation?
- [ ] No.
- [x] Yes. (session variable `iceberg_metadata_planning_mode` should be
documented)
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]