sgedward opened a new pull request, #10721:
URL: https://github.com/apache/gravitino/pull/10721
### What changes were proposed in this pull request?
When creating a Paimon table via the Flink connector with bucket options
(`bucket-key`, `bucket`),
the distribution metadata was silently dropped and not persisted in
Gravitino.
This PR fixes the issue by:
- Adding a `toGravitinoDistribution` hook in `BaseCatalog.createTable` to
allow
catalog-specific distribution parsing from raw Flink table options.
- Overriding the hook in `GravitinoPaimonCatalog` to parse `bucket-key`
and `bucket`
into a Gravitino `Distribution` object (HASH strategy).
- Overriding `toGravitinoTableProperties` in `PaimonPropertiesConverter`
to strip
the reserved `bucket-key` and `bucket` properties before passing them to
Gravitino,
avoiding property validation errors.
### Why are the changes needed?
Flink's `createTable` always passed `Distributions.NONE`, thus the
`bucket-key` and `bucket` configs are not supported.
Fix: #10368
### Does this PR introduce _any_ user-facing change?
Yes:
- Users can now specify `bucket-key` and `bucket` in Flink SQL `WITH`
options when
creating Paimon tables, and the distribution will be correctly persisted
in Gravitino metadata.
- Only HASH distribution strategy is supported.
### How was this patch tested?
- Added unit tests in `TestBaseCatalog` to verify the default hook returns
`Distributions.NONE`.
- Added unit tests in `TestPaimonPropertiesConverter` to verify
`getDistribution` correctly
parses bucket options including null properties, missing bucket key,
multiple bucket keys,
auto bucket number, and invalid bucket number.
- Added integration test `testCreateTableWithBucketDistribution` in
`FlinkPaimonCatalogIT`
to assert the distribution is correctly persisted and loaded back via
Gravitino table metadata.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]