FANNG1 commented on PR #10721:
URL: https://github.com/apache/gravitino/pull/10721#issuecomment-4303698367

   You’re right that some combinations are hard to map if we require 
`Distribution` to always have key expressions, but Gravitino’s `Distribution` 
model itself can represent a bucket number with empty expressions, e.g. 
`Distributions.hash(8)`.
   
   So maybe we can define the boundary like this:
   
   - In the Flink connector, only convert explicit options structurally.
   - If both `bucket-key` and `bucket` are present, convert to 
`Distribution.hash(bucket, bucketKeyExpressions)`.
   - If only `bucket-key` is present, convert to `Distribution.auto(HASH, 
bucketKeyExpressions)`.
   - If only `bucket` is present, convert to `Distribution.hash(bucket)` with 
empty expressions.
   - Do not do PK fallback or bucket-number semantic validation in the 
connector.
   
   On the Gravitino server side, I also think we should avoid inferring too 
much Paimon-specific behavior. The Paimon catalog can convert the received 
`Distribution` back to explicit Paimon options, but it should not try to infer 
missing `bucket-key` from primary keys or full-row fallback either. For 
example, an empty-expression distribution can simply mean “bucket number is 
specified, but bucket-key is unspecified”, and the server can pass that through 
as Paimon properties where possible, letting Paimon apply its own fallback 
semantics.
   
   This keeps both sides as structural adapters as much as possible, avoids 
duplicating Paimon inference rules in Gravitino, and should make future 
Spark/Trino support easier.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to