matrixsparse opened a new issue, #2848:
URL: https://github.com/apache/fluss/issues/2848

   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/fluss/issues) and 
found nothing similar.
   
   
   ### Motivation
   
   ## Motivation
   
   Fluss 0.9 introduced the Aggregation Merge Engine with `rbm32` and `rbm64` 
aggregate functions,
   which enables storage-level precise UV counting via RoaringBitmap.
   
   However, the current usage requires **manual client-side serialization** 
before inserting into
   `rbm32`/`rbm64` columns. This is because `FieldRoaringBitmap64Agg.agg()` 
expects both `accumulator`
   and `inputField` to be pre-serialized `byte[]` of a `Roaring64Bitmap`.
   
   As a result, users must either:
   1. Write Java application code to serialize each `userId` into a 
single-element bitmap `byte[]`, or
   2. Implement and register a custom Flink `ScalarFunction` UDF for every 
project.
   
   This creates a significant barrier to adoption, especially for:
   - Users writing Flink SQL jobs who cannot easily embed custom Java code
   - Users connecting from non-Java clients (Python, Go)
   - The **Dictionary Table + RoaringBitmap UV counting** pattern recommended 
in the 0.9 release notes,
     which requires converting an `INT` / `BIGINT` auto-increment ID into a 
serialized bitmap
   
   **Comparison with peer systems:**
   | System | Built-in bitmap functions |
   |--------|--------------------------|
   | ClickHouse | `bitmapBuild()`, `bitmapCardinality()`, `bitmapOr()` |
   | Apache Doris | `bitmap_from_array()`, `bitmap_count()`, `bitmap_union()` |
   | Apache Paimon | Requires manual serialization (same limitation) |
   | Apache Fluss | **Requires manual serialization (current state)** |
   
   ## Proposed Solution
   
   Add the following built-in functions to the **Flink connector** 
(`fluss-flink-common` module):
   
   ### Construction functions
   ```sql
   -- Create a single-element 32-bit bitmap from an INT value
   rbm32_build(value INT) → BYTES
   
   -- Create a single-element 64-bit bitmap from a BIGINT value  
   rbm64_build(value BIGINT) → BYTES
   ```
   
   ### Cardinality functions
   
   ```sql
   -- Get the number of distinct elements from a serialized 32-bit bitmap
   rbm32_cardinality(bitmap BYTES) → BIGINT
   
   -- Get the number of distinct elements from a serialized 64-bit bitmap
   rbm64_cardinality(bitmap BYTES) → BIGINT
   ```
   
   
   ### Solution
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Willingness to contribute
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to