sgedward commented on PR #10721:
URL: https://github.com/apache/gravitino/pull/10721#issuecomment-4293276751

   Hi @FANNG1 
   
   Thanks for the detailed explanation — I agree we should defer to server-side 
validation.
   
   After reviewing some of the documents, I have a few questions regarding the 
`DISTRIBUTION` alignment with Paimon options.
   
   As the Paimon Core config options list:
   
   **Bucket:** `positive`, `-1`, and `-2`
   
   **Bucket-key:** Specifies the Paimon distribution policy. Data is assigned 
to each bucket according to the hash value of `bucket-key`.
   - If you specify multiple fields, the delimiter is `,`.
   - If not specified, the primary key will be used; if there is no primary 
key, the full row will be used.
   
   ---
   
   From the above option combination, I created the table below. It seems that 
`AUTO` and `NONE` behave the same way, except `AUTO` cannot accept an empty key 
expression. I also wonder — since `bucket-key` can use at minimum the full row 
as the key, do we still need to check if a PK fallback exists?
   
   
https://github.com/apache/gravitino/blob/aa1586b422730aaf9d28265fff62d31f0c1ee29b/catalogs/catalog-lakehouse-paimon/src/main/java/org/apache/gravitino/catalog/lakehouse/paimon/PaimonCatalogOperations.java#L533-L539
   
   | bucket   | bucket-key | Flink connector Distribution     |
   |----------|------------|----------------------------------|
   | blank    | missing    | NONE                             |
   | blankif  | exists     | AUTO(-1, [key])                  |
   | -1       | missing    | hash(-1)                         |
   | -1       | exists     | hash(-1, [key]) → server rejects |
   | -2       | missing    | hash(-2)                         |
   | -2       | exists     | hash(-2, [key])                  |
   | >0       | missing    | hash(N)                          |
   | >0       | exists     | hash(N, [key])                   |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to