bobhan1 opened a new pull request, #61044: URL: https://github.com/apache/doris/pull/61044
## Summary - Register `column_data_sizes` in `BackendPartitionedSchemaScanNode.BACKEND_TABLE` so that queries to `information_schema.column_data_sizes` are distributed to all BEs instead of a randomly selected single BE. - Previously, `column_data_sizes` was treated as a regular `SchemaScanNode`, which creates only one scan range targeting a single BE. Since each BE only collects local tablet data, the query result was incomplete and inconsistent across executions. ## Root Cause `column_data_sizes` was not registered in `BackendPartitionedSchemaScanNode.BACKEND_TABLE`. This caused the FE planner to use the base `SchemaScanNode` which creates a single scan range sent to one randomly chosen BE. Each BE only scans its local tablets, so only a subset of data was returned. ## Fix Add `column_data_sizes` to the `BACKEND_TABLE` set. The table already has a `BACKEND_ID` column (lowercase `backend_id`), which matches the existing entry in `BEACKEND_ID_COLUMN_SET`, so no additional changes are needed. ## Test plan - [ ] Query `select * from information_schema.column_data_sizes where table_id=xxx` on a multi-BE cluster and verify all BEs' data is returned - [ ] Verify `BACKEND_ID` column shows different BE IDs in results 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
