github-actions[bot] commented on PR #61646: URL: https://github.com/apache/doris/pull/61646#issuecomment-4129956027
## Code Review: Refactor Iceberg System Tables from TVF to Native SysTable Path **Reviewed**: 41 files, +723/-735 lines, commit d3bf613364 ### Summary This PR refactors Iceberg system tables from the `iceberg_meta()` TVF path (`TvfSysTable` → `MetadataScanNode`) to the native sys-table path (`NativeSysTable` → `FileQueryScanNode` → `IcebergScanNode`), following the same pattern already established by Paimon system tables (`PaimonSysExternalTable`). The architecture is sound and the implementation is consistent with existing patterns. --- ### Critical Checkpoint Analysis (Part 1.3) **1. Goal and correctness**: The goal is to unify Iceberg system table access under the `table$system_table` syntax, removing the `iceberg_meta()` TVF. The code accomplishes this: `IcebergSysExternalTable` wraps `IcebergExternalTable`, derives schema from Iceberg metadata tables, and routes through `IcebergScanNode` with `FORMAT_JNI`. Regression tests cover `snapshots`, `history`, `manifests`, `files`, `partitions`, `refs`, `entries`, `all_data_files`, `all_delete_files`, `all_files`, `all_manifests`, `all_entries`, and `position_deletes`. **Pass.** **2. Modification scope**: The change is well-scoped — it adds the new path, removes the old TVF path, and migrates tests. No unrelated changes. **Pass.** **3. Concurrency**: Double-checked locking on `sysIcebergTable`, `fullSchema`, `schemaCacheValue` in `IcebergSysExternalTable` is correctly implemented with `volatile` fields and `synchronized` blocks, matching the `PaimonSysExternalTable` pattern. No new threads introduced. **No issues.** **4. Lifecycle management**: No special lifecycle concerns. `IcebergSysExternalTable` is ephemeral (created per-query via `SysTableResolver`), not cached or statically initialized. **N/A.** **5. Configuration items**: No new configuration items added. **N/A.** **6. Incompatible changes**: **Yes — breaking change.** The `iceberg_meta()` TVF is removed entirely. Users must migrate to the `table$system_table` syntax. The PR description acknowledges this. No compatibility bridge is provided, which is acceptable given the PR's stated intent to replace the TVF approach. **7. Parallel code paths**: The implementation mirrors `PaimonSysExternalTable` closely. Both paths are consistent. The `findSysTable` override in `IcebergExternalTable` for `position_deletes` follows the same dispatch pattern. **No missed parallel paths.** **8. Special conditional checks**: `isSystemTable` guards in `IcebergScanNode` are correctly placed at all critical dispatch points: `getFileFormatType()`, `isBatchMode()`, `doGetSplits()`, `setScanParams()`, `createTableScan()`. Each has clear purpose and is consistent with the surrounding code. **Pass.** **9. Test coverage**: - Regression tests migrated from `test_iceberg_meta.groovy` TVF to `table$system_table` syntax across 11 test files - Unit tests added: `IcebergSysTableResolverTest` (resolver logic, schema derivation, table type mapping) and `UserAuthenticationTest` (auth delegation for sys tables) - Some `.out` file matching replaced with runtime assertions — acceptable for dynamic metadata - Negative test coverage: error cases for invalid table types tested in resolver test - **Adequate coverage.** **10. Observability**: No new observability added. The existing scan node logging/profiling applies to the new path since it reuses `IcebergScanNode` and `FileQueryScanNode`. **Acceptable — no new critical paths that lack observability.** **11. Transaction/persistence**: No transaction or persistence changes. **N/A.** **12. Data writes**: No data write changes. **N/A.** **13. FE-BE variable passing**: New thrift field `serialized_split` (field 8) added to `TIcebergFileDesc`. This is passed through the existing `TFileRangeDesc` → `TIcebergFileDesc` path. BE reads it in `IcebergSysTableJniReader`. The field is optional, so mixed-version compatibility is preserved. **Pass.** **14. Performance analysis**: - `ShowCreateTableCommand.resolveShowCreateTarget()` is called twice (in `validate()` and `doRun()`), creating redundant `IcebergSysExternalTable` instances. Minor inefficiency, not blocking. - Two identical `instanceof IcebergSysExternalTable` blocks in `Env.java` (`getCreateTableLikeStmt` and `getDdlStmt`) — code duplication following existing pattern. Minor. - `IcebergSysTableJniScanner` now correctly reads by `SelectedField.sourceIndex` instead of projection order — this is actually a **bug fix** for column misread when projected order differs from schema order. Good. - No hot-path performance concerns; sys table reads are metadata-only operations. **15. Other observations**: - Uses `"hms"` string literal in `IcebergSysExternalTable.toThrift()` matching existing `IcebergExternalTable.toThrift()` pattern (constant `ICEBERG_HMS = "hms"` exists but existing code also uses the literal). Consistent but could be improved in a follow-up. - `generateSysTableId` uses `sourceTableId ^ (sysTableType.hashCode() * 31L)` — same formula as `PaimonSysExternalTable`. Weak hash but sufficient for ephemeral in-memory table IDs. - Routing safety confirmed: no conflict between normal Iceberg reads (`FORMAT_PARQUET/ORC`) and sys table reads (`FORMAT_JNI`) — the outer `switch(format_type)` discriminates first in `file_scanner.cpp`. --- ### Verdict **No blocking issues found.** The implementation is architecturally sound, follows established patterns (Paimon sys tables), and includes adequate test coverage. The breaking change (removing `iceberg_meta()` TVF) is intentional and documented. Minor suggestions (dedup `resolveShowCreateTarget` call, use constant for `"hms"`) are non-blocking. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
