szehon-ho commented on PR #16727: URL: https://github.com/apache/iceberg/pull/16727#issuecomment-4652308675
## Alternative considered: making `definition-id` user-specifiable (not chosen) While designing this, we considered an alternative to adding a separate field: **allow `definition-id` itself to be optionally user-specified**, so it could double as the specific name. ### What Option B would look like Instead of deriving `definition-id` strictly from the canonical parameter-type tuple, it would *default* to that canonical form but be overridable with any user-chosen, unique string. The signature used for overload resolution and the one-definition-per-signature rule would then be based solely on the `parameters` list, and `definition-id` would become an opaque identity token (which could carry the human-readable "specific name"). ### Why we didn't choose it - **It breaks a useful, checkable invariant.** Today `definition-id == canonical(parameters)`, so a reader can recompute the ID from the signature and validate it (some readers even parse it to recover types). Making it user-overridable removes that property and requires a *separate* uniqueness check on `parameters` that currently comes for free. - **Backward-compatibility risk.** Existing consumers may assume `definition-id` equals the signature. Allowing arbitrary overrides could silently break them. Option A leaves `definition-id` completely untouched, so it's purely additive. - **It re-conflates two concepts.** The whole motivation (#16700) is that the SQL standard keeps *signature* (for resolution) and *specific name* (for identification) separate. Option B reduces the field count but folds both roles back into one field — the exact conflation we're trying to address. A self-documenting, signature-derived `definition-id` plus an explicit `specific-name` keeps the two roles cleanly separated and matches the standard's model directly. - **Footgun potential.** A user could set `definition-id: "int"` on a `float` overload, producing an ID that looks like a signature but isn't. The canonical-by-default form is self-documenting; a free-form override is not. Net: Option A (this PR) is fully additive, preserves the verifiable canonical `definition-id`, and is the closest match to the SQL standard's two-name model, so we went with it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
