FANNG1 commented on PR #10475: URL: https://github.com/apache/gravitino/pull/10475#issuecomment-4219491112
Thanks for working on this issue! The root cause analysis is accurate — the missing `CatalogEnvironment` / `catalogLoader` causes Hive partition metadata not being updated. However, I have some concerns about the current dual-write approach (overriding `createTable`/`alterTable`/`dropTable` to write to both Gravitino and Paimon): 1. **Consistency risk**: There is no transaction guarantee between the two writes. If the second write fails, metadata becomes inconsistent with no rollback mechanism. For example, in `dropTable`, Gravitino metadata is deleted first — if the subsequent Paimon drop fails, the table becomes an orphan. 2. **Architecture concern**: Gravitino is designed as the single source of truth for metadata. DDL operations go through the Gravitino REST API, and Gravitino server internally syncs to the underlying catalog (Paimon). Dual-writing from the Flink connector bypasses this design. 3. **PR description vs. code mismatch**: The description states the fix is for `getTable()`/`tableExists()`, but the actual code overrides `createTable`/`alterTable`/`dropTable` instead. **Suggested direction**: The problem is that `BaseCatalog.getTable()` → `toFlinkTable()` returns a plain `CatalogTable` without Paimon-specific context. A more targeted fix would be to override `toFlinkTable()` in `GravitinoPaimonCatalog` to return a Paimon `DataCatalogTable` with a proper `CatalogEnvironment`. This keeps Gravitino as the metadata source of truth while providing the context Paimon needs for partition handling. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
