wombatu-kun commented on issue #16756:
URL: https://github.com/apache/iceberg/issues/16756#issuecomment-4687222121
Strong +1 on the framing. The reason #3681 / #4994 / #4625 stalled is that
each one asked Iceberg to *understand* these definitions in some form, and
Approach 4 sidesteps that completely: the format never reads, validates, or
evolves the data, so ownership stays in Flink where watermark and
computed-column semantics actually live. That preserves engine neutrality,
which was the core objection in all three.
One concrete point that strengthens the motivation: today this isn't even a
silent drop. `FlinkCatalog.validateFlinkTable()` hard-rejects both up front -
`UnsupportedOperationException("Creating table with computed columns is not
supported yet.")` and the matching `"... watermark specs ..."` - and
`FlinkSchemaUtil.toResolvedSchema()` rebuilds the Flink schema with
`Collections.emptyList()` for watermarks. So the two-table workaround is a
direct consequence of that, and Approach 4 is purely additive: relax the
validation, serialize the specs on `createTable`, restore them in `getTable` ->
`toCatalogTableWithProps`. No iceberg-core or spec change, and tables stay
invisible to engines that don't read the namespace.
On the open storage question, FlinkCatalog already owns a reserved-property
mechanism (`isReservedProperty` over `connector` / `src-catalog` / `location`
from `FlinkCreateTableOptions`) that it writes on create and hides on load.
Folding the Flink metadata into a single structured, Flink-namespaced property
through that same seam would give Approach 4 the structure it wants while
avoiding a separate metadata-file lifecycle (commit atomicity, orphan cleanup).
The "unstructured / mixed with Iceberg metadata" con you list for Approach 2 is
really about ad-hoc flat keys; a single well-defined JSON blob owned and hidden
by FlinkCatalog doesn't have that problem. A dedicated file stays a clean
fallback if the blob ever outgrows properties.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]