matrixsparse opened a new issue, #2848: URL: https://github.com/apache/fluss/issues/2848
### Search before asking - [x] I searched in the [issues](https://github.com/apache/fluss/issues) and found nothing similar. ### Motivation ## Motivation Fluss 0.9 introduced the Aggregation Merge Engine with `rbm32` and `rbm64` aggregate functions, which enables storage-level precise UV counting via RoaringBitmap. However, the current usage requires **manual client-side serialization** before inserting into `rbm32`/`rbm64` columns. This is because `FieldRoaringBitmap64Agg.agg()` expects both `accumulator` and `inputField` to be pre-serialized `byte[]` of a `Roaring64Bitmap`. As a result, users must either: 1. Write Java application code to serialize each `userId` into a single-element bitmap `byte[]`, or 2. Implement and register a custom Flink `ScalarFunction` UDF for every project. This creates a significant barrier to adoption, especially for: - Users writing Flink SQL jobs who cannot easily embed custom Java code - Users connecting from non-Java clients (Python, Go) - The **Dictionary Table + RoaringBitmap UV counting** pattern recommended in the 0.9 release notes, which requires converting an `INT` / `BIGINT` auto-increment ID into a serialized bitmap **Comparison with peer systems:** | System | Built-in bitmap functions | |--------|--------------------------| | ClickHouse | `bitmapBuild()`, `bitmapCardinality()`, `bitmapOr()` | | Apache Doris | `bitmap_from_array()`, `bitmap_count()`, `bitmap_union()` | | Apache Paimon | Requires manual serialization (same limitation) | | Apache Fluss | **Requires manual serialization (current state)** | ## Proposed Solution Add the following built-in functions to the **Flink connector** (`fluss-flink-common` module): ### Construction functions ```sql -- Create a single-element 32-bit bitmap from an INT value rbm32_build(value INT) → BYTES -- Create a single-element 64-bit bitmap from a BIGINT value rbm64_build(value BIGINT) → BYTES ``` ### Cardinality functions ```sql -- Get the number of distinct elements from a serialized 32-bit bitmap rbm32_cardinality(bitmap BYTES) → BIGINT -- Get the number of distinct elements from a serialized 64-bit bitmap rbm64_cardinality(bitmap BYTES) → BIGINT ``` ### Solution _No response_ ### Anything else? _No response_ ### Willingness to contribute - [ ] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
