github-actions[bot] commented on PR #61646:
URL: https://github.com/apache/doris/pull/61646#issuecomment-4129956027

   ## Code Review: Refactor Iceberg System Tables from TVF to Native SysTable 
Path
   
   **Reviewed**: 41 files, +723/-735 lines, commit d3bf613364
   
   ### Summary
   
   This PR refactors Iceberg system tables from the `iceberg_meta()` TVF path 
(`TvfSysTable` → `MetadataScanNode`) to the native sys-table path 
(`NativeSysTable` → `FileQueryScanNode` → `IcebergScanNode`), following the 
same pattern already established by Paimon system tables 
(`PaimonSysExternalTable`). The architecture is sound and the implementation is 
consistent with existing patterns.
   
   ---
   
   ### Critical Checkpoint Analysis (Part 1.3)
   
   **1. Goal and correctness**: The goal is to unify Iceberg system table 
access under the `table$system_table` syntax, removing the `iceberg_meta()` 
TVF. The code accomplishes this: `IcebergSysExternalTable` wraps 
`IcebergExternalTable`, derives schema from Iceberg metadata tables, and routes 
through `IcebergScanNode` with `FORMAT_JNI`. Regression tests cover 
`snapshots`, `history`, `manifests`, `files`, `partitions`, `refs`, `entries`, 
`all_data_files`, `all_delete_files`, `all_files`, `all_manifests`, 
`all_entries`, and `position_deletes`. **Pass.**
   
   **2. Modification scope**: The change is well-scoped — it adds the new path, 
removes the old TVF path, and migrates tests. No unrelated changes. **Pass.**
   
   **3. Concurrency**: Double-checked locking on `sysIcebergTable`, 
`fullSchema`, `schemaCacheValue` in `IcebergSysExternalTable` is correctly 
implemented with `volatile` fields and `synchronized` blocks, matching the 
`PaimonSysExternalTable` pattern. No new threads introduced. **No issues.**
   
   **4. Lifecycle management**: No special lifecycle concerns. 
`IcebergSysExternalTable` is ephemeral (created per-query via 
`SysTableResolver`), not cached or statically initialized. **N/A.**
   
   **5. Configuration items**: No new configuration items added. **N/A.**
   
   **6. Incompatible changes**: **Yes — breaking change.** The `iceberg_meta()` 
TVF is removed entirely. Users must migrate to the `table$system_table` syntax. 
The PR description acknowledges this. No compatibility bridge is provided, 
which is acceptable given the PR's stated intent to replace the TVF approach.
   
   **7. Parallel code paths**: The implementation mirrors 
`PaimonSysExternalTable` closely. Both paths are consistent. The `findSysTable` 
override in `IcebergExternalTable` for `position_deletes` follows the same 
dispatch pattern. **No missed parallel paths.**
   
   **8. Special conditional checks**: `isSystemTable` guards in 
`IcebergScanNode` are correctly placed at all critical dispatch points: 
`getFileFormatType()`, `isBatchMode()`, `doGetSplits()`, `setScanParams()`, 
`createTableScan()`. Each has clear purpose and is consistent with the 
surrounding code. **Pass.**
   
   **9. Test coverage**:
   - Regression tests migrated from `test_iceberg_meta.groovy` TVF to 
`table$system_table` syntax across 11 test files
   - Unit tests added: `IcebergSysTableResolverTest` (resolver logic, schema 
derivation, table type mapping) and `UserAuthenticationTest` (auth delegation 
for sys tables)
   - Some `.out` file matching replaced with runtime assertions — acceptable 
for dynamic metadata
   - Negative test coverage: error cases for invalid table types tested in 
resolver test
   - **Adequate coverage.**
   
   **10. Observability**: No new observability added. The existing scan node 
logging/profiling applies to the new path since it reuses `IcebergScanNode` and 
`FileQueryScanNode`. **Acceptable — no new critical paths that lack 
observability.**
   
   **11. Transaction/persistence**: No transaction or persistence changes. 
**N/A.**
   
   **12. Data writes**: No data write changes. **N/A.**
   
   **13. FE-BE variable passing**: New thrift field `serialized_split` (field 
8) added to `TIcebergFileDesc`. This is passed through the existing 
`TFileRangeDesc` → `TIcebergFileDesc` path. BE reads it in 
`IcebergSysTableJniReader`. The field is optional, so mixed-version 
compatibility is preserved. **Pass.**
   
   **14. Performance analysis**:
   - `ShowCreateTableCommand.resolveShowCreateTarget()` is called twice (in 
`validate()` and `doRun()`), creating redundant `IcebergSysExternalTable` 
instances. Minor inefficiency, not blocking.
   - Two identical `instanceof IcebergSysExternalTable` blocks in `Env.java` 
(`getCreateTableLikeStmt` and `getDdlStmt`) — code duplication following 
existing pattern. Minor.
   - `IcebergSysTableJniScanner` now correctly reads by 
`SelectedField.sourceIndex` instead of projection order — this is actually a 
**bug fix** for column misread when projected order differs from schema order. 
Good.
   - No hot-path performance concerns; sys table reads are metadata-only 
operations.
   
   **15. Other observations**:
   - Uses `"hms"` string literal in `IcebergSysExternalTable.toThrift()` 
matching existing `IcebergExternalTable.toThrift()` pattern (constant 
`ICEBERG_HMS = "hms"` exists but existing code also uses the literal). 
Consistent but could be improved in a follow-up.
   - `generateSysTableId` uses `sourceTableId ^ (sysTableType.hashCode() * 
31L)` — same formula as `PaimonSysExternalTable`. Weak hash but sufficient for 
ephemeral in-memory table IDs.
   - Routing safety confirmed: no conflict between normal Iceberg reads 
(`FORMAT_PARQUET/ORC`) and sys table reads (`FORMAT_JNI`) — the outer 
`switch(format_type)` discriminates first in `file_scanner.cpp`.
   
   ---
   
   ### Verdict
   
   **No blocking issues found.** The implementation is architecturally sound, 
follows established patterns (Paimon sys tables), and includes adequate test 
coverage. The breaking change (removing `iceberg_meta()` TVF) is intentional 
and documented. Minor suggestions (dedup `resolveShowCreateTarget` call, use 
constant for `"hms"`) are non-blocking.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to