That’s great.. I will also share my write up with the UDFs I implemented and if there is something missing from the repo after you introduce the UDFs maybe we can add them.
I will ping you to take a look once I create the blog post PR 🙏 Again exciting feature and looking forward to this. Best, Giannis On Mon, 6 Apr 2026 at 1:17 PM, Prajwal Banakar <[email protected]> wrote: > Hi Giannis, > > Thank you for the detailed and encouraging feedback. I have addressed your > points as follows: > > 1. DataTypeVisitor Breaking Change: I agree. To avoid breaking external > implementors, I will provide a default implementation that throws an > UnsupportedOperationException with a clear message. This is now documented > in the FIP under Layer 1. > > 2. NULL Semantics: Thank you for catching the contradiction. The intended > behavior is BITMAP_CARDINALITY(NULL) returns NULL (missing value), while > BITMAP_CARDINALITY(empty_bitmap) returns 0. I have added an explicit > clarification to the FIP to distinguish these cases. > > 3. 64-bit Scope: Agreed. Since FieldRoaringBitmap64Agg already exists > server-side, I have added BITMAP_BUILD_AGG_64 (BIGINT → BITMAP) as a Phase > 2 deliverable in the updated function table. > > 4. BITMAP_AND_AGG Safety: I have added a note stating that BITMAP_AND_AGG > will execute entirely in Flink without server-side pushdown support in this > release. I also clarified that combining it with the aggregation merge > engine may produce unexpected results during compaction. > > Regarding BITMAP_TO_STRING, this is a great idea for debugging. I have > mentioned this on my GSOC proposal and also added it to the "Future Work" > section of the FIP and attributed the suggestion. > > I also wanted to mention that Jark and Yang Wang have set up an official > flink-extended/flink-roaringbitmap repository for the external UDF library. > I have an open PR there including rb_cardinality, rb_or_agg, and > rb_build_agg. This may complement your current work and serve as a bridge > until native support lands in Fluss. > Link to the repo : https://github.com/flink-extended/flink-roaringbitmap > > Best regards, > Prajwal Banakar > > On Mon, 6 Apr 2026 at 12:52, Giannis Polyzos <[email protected]> > wrote: > > > Hi Prajwal, > > > > This is a great proposal. I have been creating end-to-end real-time > profile > > use cases with Fluss features, and I had to create a few UDFs to better > > interact with BM. > > The FIP solves this structural problem as it makes BITMAP a first-class > > type, promotes those same operations into the catalog so no UDF jar is > > needed, and adds server-side pushdown so BITMAP_OR_AGG queries avoid > > materializing per-row data in Flink entirely. > > > > The overall direction is solid, the three-layer decomposition is clean, > and > > the backward compatibility story is well thought out. I have a few > points I > > would like to discuss before moving forward, mostly around design > decisions > > that I think are worth aligning on at the proposal stage. > > > > 1. DataTypeVisitor is @PublicStable but im wondering whether this can be > a > > breaking change. Adding a new method to this interface would likely break > > any external implementors. It would be great if the proposal could settle > > on an approach: a default fallback method, an abstract base class users > can > > extend, or a deferral to the next major version. Any of these would work. > > > > 2. Null semantics are a bit contradicted within the proposal itself. The > > public interface section states BITMAP_CARDINALITY(NULL) → NULL, while > the > > reference implementation section returns 0 for null input. Could you > > clarify which behavior is the intended one? > > > > 3. BITMAP_BUILD_AGG accepts only INT, while the server already supports > > rbm64 for BIGINT. Given that most real-world entity IDs exceed > > Integer.MAX_VALUE and the 64-bit aggregator already exists server-side, > it > > would be worth clarifying whether the initial scope is intentionally > > limited to 32-bit and, if so, the reasoning behind that choice. > > > > 4. BITMAP_AND_AGG is included in the public interface but has no > > server-side aggregator and no pushdown support. I'm reluctant that > exposing > > this in the public API without a working server-side counterpart could > lead > > to incorrect results when users combine it with the merge engine. Would > it > > be possible to add a clear section defining exactly when it is safe to > use? > > > > Some food for thought on my side is whether it makes sense (based on my > > examples) to consider adding a function that converts a bitmap to a > string > > so it's human-readable for users debugging. This can be > > > > For example, take a bitmap and output something like > (bitmap_to_string??): > > Output format: "count=3 [1001, 1002, 1007]." > > > > This can be out of scope of this FIP, and more of a future improvement, > if > > it resonates. > > I just wanted to bring this to your attention, just in case you think it > > makes sense, because it's something that helped me a lot with my > examples. > > > > Looking forward to this > > > > Best, > > Giannis > > > > > > > > On Mon, Apr 6, 2026 at 9:12 AM Prajwal Banakar < > [email protected] > > > > > wrote: > > > > > Hi everyone, > > > > > > I have created the formal Confluence page for FIP-37 [1]. > > > > > > As this is my first time creating a FIP page, I would greatly > appreciate > > > any feedback or suggestions for improvement. The discussion will > continue > > > on this thread. > > > > > > Looking forward to your thoughts. > > > [1] > > > > > > > > > https://cwiki.apache.org/confluence/display/FLUSS/FIP-37%3A+Native+RoaringBitmap+Integration+for+Apache+Fluss > > > > > > Best regards, > > > Prajwal Banakar > > > > > > On Fri, 3 Apr 2026 at 19:22, Giannis Polyzos <[email protected]> > > > wrote: > > > > > > > Hi Prajwal, > > > > It’s probably a mistake that needs fixing. > > > > > > > > Feel free to use FIP-37 > > > > > > > > Best, > > > > Giannis > > > > > > > > On Fri, 3 Apr 2026 at 4:49 PM, Prajwal Banakar < > > > [email protected] > > > > > > > > > wrote: > > > > > > > > > Hi dev's, > > > > > > > > > > I am currently creating the Confluence page for this proposal. > > > > > > > > > > I noticed that FIP-35 is not currently listed on the wiki. Could > you > > > > please > > > > > confirm if this number is available for use, or if I should > assigned > > a > > > > > different one? > > > > > > > > > > Best regards, > > > > > Prajwal Banakar > > > > > > > > > > On Wed, 11 Mar 2026 at 22:56, Prajwal Banakar < > > > > [email protected]> > > > > > wrote: > > > > > > > > > > > Hi Keith, > > > > > > > > > > > > Thank you for the follow-up. > > > > > > > > > > > > You are correct that FieldRoaringBitmap64Agg already exists in > > > > > > fluss-server. I have updated the proposal accordingly. To > clarify, > > > the > > > > > > 32-bit scope is intended to keep the initial type system and SQL > > > > function > > > > > > surface focused and deliverable, rather than being a limitation > of > > > the > > > > > > aggregator itself. Since the server-side aggregator is already in > > > > place, > > > > > > RBM64 will be a natural, low-risk follow-on once the type system > > and > > > > > > pushdown infrastructure are established. > > > > > > > > > > > > I have also removed the misleading motivation paragraph as you > > > > suggested. > > > > > > The updated document is available at the same link. > Additionally, I > > > > would > > > > > > welcome Yang's input on the alignment with FIP-21. > > > > > > > > > > > > Best regards, > > > > > > Prajwal Banakar > > > > > > > > > > > > On Wed, 11 Mar 2026 at 17:37, Keith Lee < > > [email protected] > > > > > > > > > > wrote: > > > > > > > > > > > >> Hi Prajwal, > > > > > >> > > > > > >> Thank you for addressing / answering the questions. > > > > > >> > > > > > >> > This proposal adds the missing bridge: a proper BITMAP DDL > type, > > > SQL > > > > > >> functions (BITMAP_BUILD, BITMAP_OR_AGG, BITMAP_CARDINALITY), and > > > > > pushdown > > > > > >> via applyAggregates(). The storage-side aggregation logic > already > > > > > exists; > > > > > >> this proposal makes it accessible end-to-end > > > > > >> > > > > > >> 1. That makes sense. I think the motivation section should lead > > with > > > > > that > > > > > >> and remove the following as it can be misleading given that rbm > is > > > > > >> supported by aggregation merge engine: “users requiring > > > > high-cardinality > > > > > >> unique counting (e.g., UV analytics) must execute Client-Side > > > > > Aggregation. > > > > > >> The TabletServer is forced to send massive amounts of raw > > > > LogRecordBatch > > > > > >> rows over the network to a Flink cluster for evaluation. This > > > results > > > > in > > > > > >> unnecessary network transfer and prevents efficient utilization > of > > > the > > > > > >> existing aggregation merge engine.” > > > > > >> > > > > > >> 2. That makes sense. Thank you for the context. > > > > > >> > > > > > >> 3. > > > > > >> > > > > > >> > RBM64 requires a fundamentally different internal structure; a > > map > > > > of > > > > > >> RBM32 chunks which increases implementation and serialization > > > > complexity > > > > > >> significantly. > > > > > >> > > > > > >> My understanding is that the proposal wires existing > > > > > >> FieldRoaringBitmap32Agg to support rbm32. > FieldRoaringBitmap64Agg > > > > should > > > > > >> already exist and handle the complexity that you mentioned? > > > > > >> > > > > > >> Additionally, it might be good for Yang to review / provide > input > > on > > > > > this > > > > > >> given his work on FIP-21. > > > > > >> > > > > > >> Best regards > > > > > >> > > > > > >> Keith Lee > > > > > >> > > > > > >> > > > > > >> On Wed, 11 Mar 2026 at 05:49, Prajwal Banakar < > > > > > [email protected] > > > > > >> > > > > > > >> wrote: > > > > > >> > > > > > >> > Hi Keith, thank you for the detailed feedback. > > > > > >> > > > > > > >> > 1. On motivation vs existing aggregation merge engine: The > > > > aggregation > > > > > >> > merge engine in 0.9 supports rbm32/rbm64 at the storage level, > > but > > > > > >> BITMAP > > > > > >> > is not yet a first-class type in the DDL or type system. Users > > > today > > > > > >> must > > > > > >> > declare the column as BYTES (as shown in the 0.9 release > > example: > > > > > >> uv_bitmap > > > > > >> > BYTES), and there are no SQL functions to build, merge, or > query > > > > > bitmaps > > > > > >> > from Flink SQL. This proposal adds the missing bridge: a > proper > > > > BITMAP > > > > > >> DDL > > > > > >> > type, SQL functions (BITMAP_BUILD, BITMAP_OR_AGG, > > > > BITMAP_CARDINALITY), > > > > > >> and > > > > > >> > pushdown via applyAggregates(). The storage-side aggregation > > logic > > > > > >> already > > > > > >> > exists; this proposal makes it accessible end-to-end. > > > > > >> > > > > > > >> > 2. On NULL semantics: BITMAP_OR(bitmap, NULL) returns NULL > > > following > > > > > >> > standard SQL scalar function semantics where NULL inputs > > propagate > > > > to > > > > > >> NULL > > > > > >> > outputs. BITMAP_OR_AGG follows aggregate function convention > > > > > consistent > > > > > >> > with how SUM and AVG behave, where NULLs in individual rows > are > > > > > skipped > > > > > >> and > > > > > >> > only a fully NULL input set returns NULL. This distinction > > follows > > > > > >> FLIP-556 > > > > > >> > and StarRocks semantics. > > > > > >> > > > > > > >> > 3. On 32-bit scope: The proposal is scoped to 32-bit initially > > > > because > > > > > >> > RoaringBitmap32 covers integer values up to 2^32 (~4 billion), > > > which > > > > > is > > > > > >> > sufficient for most user ID and session ID use cases. RBM64 > > > > requires a > > > > > >> > fundamentally different internal structure; a map of RBM32 > > chunks > > > > > which > > > > > >> > increases implementation and serialization complexity > > > significantly. > > > > > >> > Starting with 32-bit keeps the initial scope focused and > > > > deliverable. > > > > > >> RBM64 > > > > > >> > support is listed as a Could-Have in the MoSCoW deliverables > and > > > can > > > > > >> follow > > > > > >> > in a subsequent iteration. > > > > > >> > > > > > > >> > Best regards, > > > > > >> > > > > > > >> > Prajwal Banakar > > > > > >> > > > > > > >> > > > > > > >> > On Wed, 11 Mar 2026 at 01:34, Keith Lee < > > > > [email protected]> > > > > > >> > wrote: > > > > > >> > > > > > > >> > > Hello Prajwal, > > > > > >> > > > > > > > >> > > Thank you for the detailed proposal. I enjoyed reading it > and > > > > have a > > > > > >> few > > > > > >> > > questions/comments. > > > > > >> > > > > > > > >> > > 1. On motivation, can you provide context on how this > differs > > > with > > > > > >> > > aggregation merge engine’s roaring bitmap implementation > [1]? > > > > > >> > Specifically, > > > > > >> > > motivation part states that “users requiring high > cardinality > > > > unique > > > > > >> > > counting … must execute client-side aggregation”. > Aggregation > > > > merge > > > > > >> > engine > > > > > >> > > performs aggregation on server-side. The motivation section > > > should > > > > > >> > clarify > > > > > >> > > how the proposed changes improve or complement aggregation > > merge > > > > > >> engine, > > > > > >> > > which seems to have been considered as Section 2 references > > > FIP-21 > > > > > >> > > Aggregation Merge Engine. Adding this context will help > > readers > > > > > >> > understand > > > > > >> > > the motivation of the proposal better. > > > > > >> > > > > > > > >> > > 2. Can you clarify the NULL semantics section specifically > on > > > the > > > > > >> > decision > > > > > >> > > on why BITMAP_OR(bitmap, NULL) returns NULL but > BITMAP_OR_AGG > > > only > > > > > >> > returns > > > > > >> > > null when all rows are NULL? > > > > > >> > > > > > > > >> > > 3. Why is the scope limited to 32 bit bitmaps? Adding the > > > > rationale > > > > > >> > behind > > > > > >> > > these e.g. how (if any) support of 64bit bitmaps would > > increase > > > > > >> > > implementation complexity. Articulating these may help other > > > > > >> contributors > > > > > >> > > understand the complexity and perhaps come up with > suggestions > > > on > > > > > how > > > > > >> to > > > > > >> > > address them. > > > > > >> > > > > > > > >> > > Best regards > > > > > >> > > > > > > > >> > > Keith Lee > > > > > >> > > > > > > > >> > > [1] > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > https://fluss.apache.org/blog/releases/0.9/#2-storage-level-processing--semantics > > > > > >> > > > > > > > >> > > > > > > > >> > > On Mon, 9 Mar 2026 at 05:31, Prajwal Banakar < > > > > > >> [email protected] > > > > > >> > > > > > > > >> > > wrote: > > > > > >> > > > > > > > >> > > > Hi Devs, > > > > > >> > > > > > > > > >> > > > I have pushed a working prototype to my public fork > > > > demonstrating > > > > > >> the > > > > > >> > > > BitmapType integrated with FieldRoaringBitmap32Agg. This > > > > includes > > > > > >> four > > > > > >> > > > passing unit tests. > > > > > >> > > > > > > > > >> > > > The link to the prototype is available in the Google Doc, > > and > > > > you > > > > > >> can > > > > > >> > > also > > > > > >> > > > find it here: > > > > > >> > > > > > > > > >> > > > https://github.com/Prajwal-banakar/fluss/tree/RoaringBitmap-prototype > > > > > >> > > > > > > > > >> > > > The Google Doc link remains the same. I look forward to > your > > > > > >> feedback. > > > > > >> > > > > > > > > >> > > > Best regards, > > > > > >> > > > > > > > > >> > > > Prajwal Banakar > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > On Sun, 1 Mar, 2026, 11:49 am Prajwal Banakar, < > > > > > >> > > [email protected] > > > > > >> > > > > > > > > > >> > > > wrote: > > > > > >> > > > > > > > > >> > > > > Hi everyone, > > > > > >> > > > > > > > > > >> > > > > I would like to start a discussion on the proposal for > > > Native > > > > > >> Bitmap > > > > > >> > > > > Integration & Stateless Pushdown Aggregation. > > > > > >> > > > > > > > > > >> > > > > This proposal enables end-to-end native support for the > > > BITMAP > > > > > >> type > > > > > >> > in > > > > > >> > > > > Fluss and integrates it with the existing aggregation > > merge > > > > > >> engine to > > > > > >> > > > > support server-side bitmap union pushdown. The goal is > to > > > > reduce > > > > > >> > > network > > > > > >> > > > > transfer and offload DISTINCT-style aggregation from > Flink > > > to > > > > > the > > > > > >> > > > > TabletServer. > > > > > >> > > > > > > > > > >> > > > > Key highlights of the proposal include: > > > > > >> > > > > > > > > > >> > > > > - Type System: Promoting BITMAP to a first-class logical > > > type. > > > > > >> > > > > - UDF Suite: Introducing BITMAP_BUILD, BITMAP_OR_AGG, > and > > > > > >> > > > > BITMAP_CARDINALITY (aligned with FLIP-556 and StarRocks > > > > > >> semantics). > > > > > >> > > > > - Optimizer: Planner-based pushdown via applyAggregates > in > > > the > > > > > >> Flink > > > > > >> > > > > connector. > > > > > >> > > > > - Safety: No changes to LogRecordBatch or WAL, making > this > > > > > >> strictly > > > > > >> > > > > additive and migration-free. > > > > > >> > > > > > > > > > >> > > > > You can find the full proposal document here: > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > https://docs.google.com/document/d/1sDhfkmo-w-UTvo2n3rsY1lytSSryswfkI83cSdka8s0/edit?usp=sharing > > > > > >> > > > > > > > > > >> > > > > I would appreciate feedback on the public interfaces, > > > pushdown > > > > > >> > > > > constraints, and overall scope. > > > > > >> > > > > > > > > > >> > > > > Best regards, > > > > > >> > > > > Prajwal Banakar > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > >
