suxiaogang223 opened a new pull request, #62006:
URL: https://github.com/apache/doris/pull/62006

   ### What problem does this PR solve?
   
   **Problem Summary:**
   
   For large Iceberg tables with thousands of manifest files, FE-local manifest 
scanning becomes
   a bottleneck during query planning. All manifest I/O runs on the FE, 
limiting planning
   throughput regardless of how many BE nodes are available.
   
   This PR introduces **distributed metadata planning**: FE partitions the 
matching manifests
   across available BE nodes, each BE scans its assigned manifests in parallel, 
and FE
   reconstructs the scan tasks from the results. This allows planning to scale 
with cluster size
   for large tables.
   
   The planning path is controlled by the session variable 
`iceberg_metadata_planning_mode`:
   
   - `local` (default) — FE-local planning, existing behavior unchanged
   - `distributed` — always use distributed planning
   - `auto` — automatically choose based on table size and cluster capacity
   
   In `auto` mode, Doris collects lightweight signals from manifest-list 
metadata (no manifest
   file I/O) and applies a threshold-based algorithm to decide whether 
distributed planning
   provides a meaningful parallelism advantage over the local path.
   
   ### Release note
   
   Support distributed Iceberg metadata planning via session variable
   `iceberg_metadata_planning_mode` (values: `local` / `distributed` / `auto`, 
default: `local`).
   In `auto` mode, Doris automatically offloads manifest scanning to BE nodes 
when the table is
   large enough to benefit from distributed parallelism.
   
   ### Type of change
   
   - [ ] Bug fix
   - [x] New feature
   - [ ] Breaking change
   - [ ] Documentation update
   
   ### Check List (For Author)
   
   - Test
       - [ ] Regression test
       - [ ] Unit Test
       - [x] Manual test
       - [ ] No need to test or manual test. Explain why:
   
   - Behavior changed:
       - [ ] No.
       - [x] Yes.
         New session variable `iceberg_metadata_planning_mode` (default 
`local`) controls whether
         manifest scanning is offloaded to BE. Existing behavior is preserved 
by default.
   
   - Does this need documentation?
       - [ ] No.
       - [x] Yes. (session variable `iceberg_metadata_planning_mode` should be 
documented)
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to