sgedward commented on PR #10721: URL: https://github.com/apache/gravitino/pull/10721#issuecomment-4293276751
Hi @FANNG1 Thanks for the detailed explanation — I agree we should defer to server-side validation. After reviewing some of the documents, I have a few questions regarding the `DISTRIBUTION` alignment with Paimon options. As the Paimon Core config options list: **Bucket:** `positive`, `-1`, and `-2` **Bucket-key:** Specifies the Paimon distribution policy. Data is assigned to each bucket according to the hash value of `bucket-key`. - If you specify multiple fields, the delimiter is `,`. - If not specified, the primary key will be used; if there is no primary key, the full row will be used. --- From the above option combination, I created the table below. It seems that `AUTO` and `NONE` behave the same way, except `AUTO` cannot accept an empty key expression. I also wonder — since `bucket-key` can use at minimum the full row as the key, do we still need to check if a PK fallback exists? https://github.com/apache/gravitino/blob/aa1586b422730aaf9d28265fff62d31f0c1ee29b/catalogs/catalog-lakehouse-paimon/src/main/java/org/apache/gravitino/catalog/lakehouse/paimon/PaimonCatalogOperations.java#L533-L539 | bucket | bucket-key | Flink connector Distribution | |----------|------------|----------------------------------| | blank | missing | NONE | | blankif | exists | AUTO(-1, [key]) | | -1 | missing | hash(-1) | | -1 | exists | hash(-1, [key]) → server rejects | | -2 | missing | hash(-2) | | -2 | exists | hash(-2, [key]) | | >0 | missing | hash(N) | | >0 | exists | hash(N, [key]) | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
