suxiaogang223 opened a new pull request, #61646:
URL: https://github.com/apache/doris/pull/61646

   ### What problem does this PR solve?
   
   Issue Number: N/A
   
   Related PR: N/A
   
   ## Summary
   
   - Refactor Iceberg system tables (`$snapshots`, `$history`, `$manifests`, 
`$files`, `$entries`, `$metadata_log_entries`, `$partitions`, `$refs`) to use 
the native sys-table path (`NativeSysTable -> FileQueryScanNode -> 
IcebergScanNode`) instead of the TVF path (`iceberg_meta()` -> 
`MetadataScanNode`)
   - Introduce `IcebergSysExternalTable` as the native wrapper for Iceberg 
metadata tables and route query / describe / auth / show-create flows through 
the source Iceberg table
   - Keep `position_deletes` explicitly unsupported with the existing `SysTable 
position_deletes is not supported yet` behavior
   - Add the minimum BE protocol / scanner changes required to read Iceberg 
system-table splits through `FORMAT_JNI`
   - Remove the `iceberg_meta()` TVF exposure and migrate regression coverage 
to direct `table$system_table` access
   
   ## Motivation
   
   Previously, Iceberg system tables were queried through `iceberg_meta()`, 
which:
   - Forced system-table queries onto the `MetadataScanNode` path instead of 
the regular file-query path
   - Kept Iceberg system table planning and execution separate from normal 
Iceberg scans
   - Required dedicated TVF-only coverage and made auth / describe / 
show-create handling more fragmented
   - Made it harder to unify planner behavior with other native system table 
implementations
   
   This refactor moves Iceberg system tables onto the native path while keeping 
the first implementation pragmatic: metadata rows are still produced by the 
existing Java Iceberg scanner, but planning, privilege checks, and scan-node 
routing now match the native execution model.
   
   ## Architecture After Refactoring
   
   ### Native SysTable Path
   
   ```
   User: SELECT * FROM table$snapshots
   
   BindRelation / SysTableResolver
     -> IcebergSysTable.createSysExternalTable(...)
     -> IcebergSysExternalTable
     -> LogicalFileScan
     -> IcebergScanNode
     -> FORMAT_JNI file ranges with serialized_split
     -> IcebergSysTableJniReader / IcebergSysTableJniScanner
   ```
   
   ### Key Changes
   
   - `IcebergSysTable` now extends `NativeSysTable` instead of `TvfSysTable`
   - `IcebergSysExternalTable` wraps the source `IcebergExternalTable`, derives 
schema from the Iceberg metadata table, and reuses the source table descriptor 
for thrift
   - `IcebergApiSource` / `IcebergScanNode` now recognize both normal Iceberg 
external tables and Iceberg system tables
   - `TIcebergFileDesc` adds `serialized_split`; FE sends one serialized split 
per system-table range
   - BE `file_scanner.cpp` now routes `FORMAT_JNI + table_format_type=iceberg` 
to `IcebergSysTableJniReader`
   - `IcebergSysTableJniReader` and `IcebergSysTableJniScanner` accept both the 
legacy `serialized_splits` payload and the new single `serialized_split` payload
   - `PhysicalPlanTranslator`, `UserAuthentication`, and `SHOW CREATE TABLE` 
now resolve Iceberg system tables through the source table correctly
   
   ### TVF Removal
   
   - Remove `IcebergTableValuedFunction`
   - Remove Nereids `IcebergMeta`
   - Remove builtin `iceberg_meta()` registration
   - Delete the obsolete `test_iceberg_meta` regression case and migrate 
Iceberg system-table regression checks to direct `$system_table` access
   
   ## Test Plan
   
   - [x] FE build
       - `DISABLE_BUILD_UI=ON DISABLE_BUILD_HIVE_UDF=ON 
DISABLE_BE_JAVA_EXTENSIONS=ON ./build.sh --fe -j8`
   - [x] FE unit tests
       - `bash run-fe-ut.sh --run 
org.apache.doris.datasource.systable.IcebergSysTableResolverTest,org.apache.doris.nereids.rules.analysis.UserAuthenticationTest`
   - [ ] BE build
       - Not completed locally because this environment could not fetch the 
required `apache-orc` dependency outside the sandbox
   - [ ] Regression test
       - Not completed locally in this environment
   
   ### Check List (For Author)
   
   - Test <!-- At least one of them must be included. -->
       - [ ] Regression test
       - [x] Unit Test
       - [ ] Manual test (add detailed scripts or steps below)
       - [ ] No need to test or manual test. Explain why:
           - [ ] This is a refactor/code format and no logic has been changed.
           - [ ] Previous test can cover this change.
           - [ ] No code files have been changed.
           - [x] Other reason <!-- Add your reason?  -->
               - BE build and regression execution were not completed locally 
because this environment blocks the required dependency fetch / cluster 
connectivity.
   
   - Behavior changed:
       - [x] No.
       - [ ] Yes. <!-- Explain the behavior change -->
   
   - Does this need documentation?
       - [x] No.
       - [ ] Yes. <!-- Add document PR link here. eg: 
https://github.com/apache/doris-website/pull/1214 -->
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label <!-- Add branch pick label that this PR should 
merge into -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to