sgedward opened a new pull request, #10721:
URL: https://github.com/apache/gravitino/pull/10721

   ### What changes were proposed in this pull request?
   
     When creating a Paimon table via the Flink connector with bucket options 
(`bucket-key`, `bucket`),
     the distribution metadata was silently dropped and not persisted in 
Gravitino.
   
     This PR fixes the issue by:
     - Adding a `toGravitinoDistribution` hook in `BaseCatalog.createTable` to 
allow
       catalog-specific distribution parsing from raw Flink table options.
     - Overriding the hook in `GravitinoPaimonCatalog` to parse `bucket-key` 
and `bucket`
       into a Gravitino `Distribution` object (HASH strategy).
     - Overriding `toGravitinoTableProperties` in `PaimonPropertiesConverter` 
to strip
       the reserved `bucket-key` and `bucket` properties before passing them to 
Gravitino,
       avoiding property validation errors.
   
     ### Why are the changes needed?
   
     Flink's `createTable` always passed `Distributions.NONE`, thus the 
`bucket-key` and `bucket` configs are not supported.
   
     Fix: #10368
   
     ### Does this PR introduce _any_ user-facing change?
   
     Yes:
     - Users can now specify `bucket-key` and `bucket` in Flink SQL `WITH` 
options when
       creating Paimon tables, and the distribution will be correctly persisted 
in Gravitino metadata.
     - Only HASH distribution strategy is supported.
   
     ### How was this patch tested?
   
     - Added unit tests in `TestBaseCatalog` to verify the default hook returns 
`Distributions.NONE`.
     - Added unit tests in `TestPaimonPropertiesConverter` to verify 
`getDistribution` correctly
       parses bucket options including null properties, missing bucket key, 
multiple bucket keys,
       auto bucket number, and invalid bucket number.
     - Added integration test `testCreateTableWithBucketDistribution` in 
`FlinkPaimonCatalogIT`
       to assert the distribution is correctly persisted and loaded back via 
Gravitino table metadata.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to