FANNG1 commented on PR #10721: URL: https://github.com/apache/gravitino/pull/10721#issuecomment-4303698367
You’re right that some combinations are hard to map if we require `Distribution` to always have key expressions, but Gravitino’s `Distribution` model itself can represent a bucket number with empty expressions, e.g. `Distributions.hash(8)`. So maybe we can define the boundary like this: - In the Flink connector, only convert explicit options structurally. - If both `bucket-key` and `bucket` are present, convert to `Distribution.hash(bucket, bucketKeyExpressions)`. - If only `bucket-key` is present, convert to `Distribution.auto(HASH, bucketKeyExpressions)`. - If only `bucket` is present, convert to `Distribution.hash(bucket)` with empty expressions. - Do not do PK fallback or bucket-number semantic validation in the connector. On the Gravitino server side, I also think we should avoid inferring too much Paimon-specific behavior. The Paimon catalog can convert the received `Distribution` back to explicit Paimon options, but it should not try to infer missing `bucket-key` from primary keys or full-row fallback either. For example, an empty-expression distribution can simply mean “bucket number is specified, but bucket-key is unspecified”, and the server can pass that through as Paimon properties where possible, letting Paimon apply its own fallback semantics. This keeps both sides as structural adapters as much as possible, avoids duplicating Paimon inference rules in Gravitino, and should make future Spark/Trino support easier. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
