That’s great.. I will also share my write up with the UDFs I implemented
and if there is something missing from the repo after you introduce the
UDFs maybe we can add them.

I will ping you to take a look once I create the blog post PR 🙏

Again exciting feature and looking forward to this.

Best,
Giannis

On Mon, 6 Apr 2026 at 1:17 PM, Prajwal Banakar <[email protected]>
wrote:

> Hi Giannis,
>
> Thank you for the detailed and encouraging feedback. I have addressed your
> points as follows:
>
> 1. DataTypeVisitor Breaking Change: I agree. To avoid breaking external
> implementors, I will provide a default implementation that throws an
> UnsupportedOperationException with a clear message. This is now documented
> in the FIP under Layer 1.
>
> 2. NULL Semantics: Thank you for catching the contradiction. The intended
> behavior is BITMAP_CARDINALITY(NULL) returns NULL (missing value), while
> BITMAP_CARDINALITY(empty_bitmap) returns 0. I have added an explicit
> clarification to the FIP to distinguish these cases.
>
> 3. 64-bit Scope: Agreed. Since FieldRoaringBitmap64Agg already exists
> server-side, I have added BITMAP_BUILD_AGG_64 (BIGINT → BITMAP) as a Phase
> 2 deliverable in the updated function table.
>
> 4. BITMAP_AND_AGG Safety: I have added a note stating that BITMAP_AND_AGG
> will execute entirely in Flink without server-side pushdown support in this
> release. I also clarified that combining it with the aggregation merge
> engine may produce unexpected results during compaction.
>
> Regarding BITMAP_TO_STRING, this is a great idea for debugging. I have
> mentioned this on my GSOC proposal and also added it to the "Future Work"
> section of the FIP and attributed the suggestion.
>
> I also wanted to mention that Jark and Yang Wang have set up an official
> flink-extended/flink-roaringbitmap repository for the external UDF library.
> I have an open PR there including rb_cardinality, rb_or_agg, and
> rb_build_agg. This may complement your current work and serve as a bridge
> until native support lands in Fluss.
> Link to the repo : https://github.com/flink-extended/flink-roaringbitmap
>
> Best regards,
> Prajwal Banakar
>
> On Mon, 6 Apr 2026 at 12:52, Giannis Polyzos <[email protected]>
> wrote:
>
> > Hi Prajwal,
> >
> > This is a great proposal. I have been creating end-to-end real-time
> profile
> > use cases with Fluss features, and I had to create a few UDFs to better
> > interact with BM.
> > The FIP solves this structural problem as it makes BITMAP a first-class
> > type, promotes those same operations into the catalog so no UDF jar is
> > needed, and adds server-side pushdown so BITMAP_OR_AGG queries avoid
> > materializing per-row data in Flink entirely.
> >
> > The overall direction is solid, the three-layer decomposition is clean,
> and
> > the backward compatibility story is well thought out. I have a few
> points I
> > would like to discuss before moving forward, mostly around design
> decisions
> > that I think are worth aligning on at the proposal stage.
> >
> > 1. DataTypeVisitor is @PublicStable but im wondering whether this can be
> a
> > breaking change. Adding a new method to this interface would likely break
> > any external implementors. It would be great if the proposal could settle
> > on an approach: a default fallback method, an abstract base class users
> can
> > extend, or a deferral to the next major version. Any of these would work.
> >
> > 2. Null semantics are a bit contradicted within the proposal itself. The
> > public interface section states BITMAP_CARDINALITY(NULL) → NULL, while
> the
> > reference implementation section returns 0 for null input. Could you
> > clarify which behavior is the intended one?
> >
> > 3. BITMAP_BUILD_AGG accepts only INT, while the server already supports
> > rbm64 for BIGINT. Given that most real-world entity IDs exceed
> > Integer.MAX_VALUE and the 64-bit aggregator already exists server-side,
> it
> > would be worth clarifying whether the initial scope is intentionally
> > limited to 32-bit and, if so, the reasoning behind that choice.
> >
> > 4. BITMAP_AND_AGG is included in the public interface but has no
> > server-side aggregator and no pushdown support. I'm reluctant that
> exposing
> > this in the public API without a working server-side counterpart could
> lead
> > to incorrect results when users combine it with the merge engine. Would
> it
> > be possible to add a clear section defining exactly when it is safe to
> use?
> >
> > Some food for thought on my side is whether it makes sense (based on my
> > examples) to consider adding a function that converts a bitmap to a
> string
> > so it's human-readable for users debugging. This can be
> >
> > For example, take a bitmap and output something like
> (bitmap_to_string??):
> > Output format: "count=3 [1001, 1002, 1007]."
> >
> > This can be out of scope of this FIP, and more of a future improvement,
> if
> > it resonates.
> > I just wanted to bring this to your attention, just in case you think it
> > makes sense, because it's something that helped me a lot with my
> examples.
> >
> > Looking forward to this
> >
> > Best,
> > Giannis
> >
> >
> >
> > On Mon, Apr 6, 2026 at 9:12 AM Prajwal Banakar <
> [email protected]
> > >
> > wrote:
> >
> > > Hi everyone,
> > >
> > > I have created the formal Confluence page for FIP-37 [1].
> > >
> > > As this is my first time creating a FIP page, I would greatly
> appreciate
> > > any feedback or suggestions for improvement. The discussion will
> continue
> > > on this thread.
> > >
> > > Looking forward to your thoughts.
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLUSS/FIP-37%3A+Native+RoaringBitmap+Integration+for+Apache+Fluss
> > >
> > > Best regards,
> > > Prajwal Banakar
> > >
> > > On Fri, 3 Apr 2026 at 19:22, Giannis Polyzos <[email protected]>
> > > wrote:
> > >
> > > > Hi Prajwal,
> > > > It’s probably a mistake that needs fixing.
> > > >
> > > > Feel free to use FIP-37
> > > >
> > > > Best,
> > > > Giannis
> > > >
> > > > On Fri, 3 Apr 2026 at 4:49 PM, Prajwal Banakar <
> > > [email protected]
> > > > >
> > > > wrote:
> > > >
> > > > > Hi dev's,
> > > > >
> > > > > I am currently creating the Confluence page for this proposal.
> > > > >
> > > > > I noticed that FIP-35 is not currently listed on the wiki. Could
> you
> > > > please
> > > > > confirm if this number is available for use, or if I should
> assigned
> > a
> > > > > different one?
> > > > >
> > > > > Best regards,
> > > > > Prajwal Banakar
> > > > >
> > > > > On Wed, 11 Mar 2026 at 22:56, Prajwal Banakar <
> > > > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Hi Keith,
> > > > > >
> > > > > > Thank you for the follow-up.
> > > > > >
> > > > > > You are correct that FieldRoaringBitmap64Agg already exists in
> > > > > > fluss-server. I have updated the proposal accordingly. To
> clarify,
> > > the
> > > > > > 32-bit scope is intended to keep the initial type system and SQL
> > > > function
> > > > > > surface focused and deliverable, rather than being a limitation
> of
> > > the
> > > > > > aggregator itself. Since the server-side aggregator is already in
> > > > place,
> > > > > > RBM64 will be a natural, low-risk follow-on once the type system
> > and
> > > > > > pushdown infrastructure are established.
> > > > > >
> > > > > > I have also removed the misleading motivation paragraph as you
> > > > suggested.
> > > > > > The updated document is available at the same link.
> Additionally, I
> > > > would
> > > > > > welcome Yang's input on the alignment with FIP-21.
> > > > > >
> > > > > > Best regards,
> > > > > > Prajwal Banakar
> > > > > >
> > > > > > On Wed, 11 Mar 2026 at 17:37, Keith Lee <
> > [email protected]
> > > >
> > > > > > wrote:
> > > > > >
> > > > > >> Hi Prajwal,
> > > > > >>
> > > > > >> Thank you for addressing / answering the questions.
> > > > > >>
> > > > > >> > This proposal adds the missing bridge: a proper BITMAP DDL
> type,
> > > SQL
> > > > > >> functions (BITMAP_BUILD, BITMAP_OR_AGG, BITMAP_CARDINALITY), and
> > > > > pushdown
> > > > > >> via applyAggregates(). The storage-side aggregation logic
> already
> > > > > exists;
> > > > > >> this proposal makes it accessible end-to-end
> > > > > >>
> > > > > >> 1. That makes sense. I think the motivation section should lead
> > with
> > > > > that
> > > > > >> and remove the following as it can be misleading given that rbm
> is
> > > > > >> supported by aggregation merge engine: “users requiring
> > > > high-cardinality
> > > > > >> unique counting (e.g., UV analytics) must execute Client-Side
> > > > > Aggregation.
> > > > > >> The TabletServer is forced to send massive amounts of raw
> > > > LogRecordBatch
> > > > > >> rows over the network to a Flink cluster for evaluation. This
> > > results
> > > > in
> > > > > >> unnecessary network transfer and prevents efficient utilization
> of
> > > the
> > > > > >> existing aggregation merge engine.”
> > > > > >>
> > > > > >> 2. That makes sense. Thank you for the context.
> > > > > >>
> > > > > >> 3.
> > > > > >>
> > > > > >> > RBM64 requires a fundamentally different internal structure; a
> > map
> > > > of
> > > > > >> RBM32 chunks which increases implementation and serialization
> > > > complexity
> > > > > >> significantly.
> > > > > >>
> > > > > >> My understanding is that the proposal wires existing
> > > > > >> FieldRoaringBitmap32Agg to support rbm32.
> FieldRoaringBitmap64Agg
> > > > should
> > > > > >> already exist and handle the complexity that you mentioned?
> > > > > >>
> > > > > >> Additionally, it might be good for Yang to review / provide
> input
> > on
> > > > > this
> > > > > >> given his work on FIP-21.
> > > > > >>
> > > > > >> Best regards
> > > > > >>
> > > > > >> Keith Lee
> > > > > >>
> > > > > >>
> > > > > >> On Wed, 11 Mar 2026 at 05:49, Prajwal Banakar <
> > > > > [email protected]
> > > > > >> >
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hi Keith, thank you for the detailed feedback.
> > > > > >> >
> > > > > >> > 1. On motivation vs existing aggregation merge engine: The
> > > > aggregation
> > > > > >> > merge engine in 0.9 supports rbm32/rbm64 at the storage level,
> > but
> > > > > >> BITMAP
> > > > > >> > is not yet a first-class type in the DDL or type system. Users
> > > today
> > > > > >> must
> > > > > >> > declare the column as BYTES (as shown in the 0.9 release
> > example:
> > > > > >> uv_bitmap
> > > > > >> > BYTES), and there are no SQL functions to build, merge, or
> query
> > > > > bitmaps
> > > > > >> > from Flink SQL. This proposal adds the missing bridge: a
> proper
> > > > BITMAP
> > > > > >> DDL
> > > > > >> > type, SQL functions (BITMAP_BUILD, BITMAP_OR_AGG,
> > > > BITMAP_CARDINALITY),
> > > > > >> and
> > > > > >> > pushdown via applyAggregates(). The storage-side aggregation
> > logic
> > > > > >> already
> > > > > >> > exists; this proposal makes it accessible end-to-end.
> > > > > >> >
> > > > > >> > 2. On NULL semantics: BITMAP_OR(bitmap, NULL) returns NULL
> > > following
> > > > > >> > standard SQL scalar function semantics where NULL inputs
> > propagate
> > > > to
> > > > > >> NULL
> > > > > >> > outputs. BITMAP_OR_AGG follows aggregate function convention
> > > > > consistent
> > > > > >> > with how SUM and AVG behave, where NULLs in individual rows
> are
> > > > > skipped
> > > > > >> and
> > > > > >> > only a fully NULL input set returns NULL. This distinction
> > follows
> > > > > >> FLIP-556
> > > > > >> > and StarRocks semantics.
> > > > > >> >
> > > > > >> > 3. On 32-bit scope: The proposal is scoped to 32-bit initially
> > > > because
> > > > > >> > RoaringBitmap32 covers integer values up to 2^32 (~4 billion),
> > > which
> > > > > is
> > > > > >> > sufficient for most user ID and session ID use cases. RBM64
> > > > requires a
> > > > > >> > fundamentally different internal structure; a map of RBM32
> > chunks
> > > > > which
> > > > > >> > increases implementation and serialization complexity
> > > significantly.
> > > > > >> > Starting with 32-bit keeps the initial scope focused and
> > > > deliverable.
> > > > > >> RBM64
> > > > > >> > support is listed as a Could-Have in the MoSCoW deliverables
> and
> > > can
> > > > > >> follow
> > > > > >> > in a subsequent iteration.
> > > > > >> >
> > > > > >> > Best regards,
> > > > > >> >
> > > > > >> > Prajwal Banakar
> > > > > >> >
> > > > > >> >
> > > > > >> > On Wed, 11 Mar 2026 at 01:34, Keith Lee <
> > > > [email protected]>
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> > > Hello Prajwal,
> > > > > >> > >
> > > > > >> > > Thank you for the detailed proposal. I enjoyed reading it
> and
> > > > have a
> > > > > >> few
> > > > > >> > > questions/comments.
> > > > > >> > >
> > > > > >> > > 1. On motivation, can you provide context on how this
> differs
> > > with
> > > > > >> > > aggregation merge engine’s roaring bitmap implementation
> [1]?
> > > > > >> > Specifically,
> > > > > >> > > motivation part states that “users requiring high
> cardinality
> > > > unique
> > > > > >> > > counting … must execute client-side aggregation”.
> Aggregation
> > > > merge
> > > > > >> > engine
> > > > > >> > > performs aggregation on server-side. The motivation section
> > > should
> > > > > >> > clarify
> > > > > >> > > how the proposed changes improve or complement aggregation
> > merge
> > > > > >> engine,
> > > > > >> > > which seems to have been considered as Section 2 references
> > > FIP-21
> > > > > >> > > Aggregation Merge Engine. Adding this context will help
> > readers
> > > > > >> > understand
> > > > > >> > > the motivation of the proposal better.
> > > > > >> > >
> > > > > >> > > 2. Can you clarify the NULL semantics section specifically
> on
> > > the
> > > > > >> > decision
> > > > > >> > > on why BITMAP_OR(bitmap, NULL) returns NULL but
> BITMAP_OR_AGG
> > > only
> > > > > >> > returns
> > > > > >> > > null when all rows are NULL?
> > > > > >> > >
> > > > > >> > > 3. Why is the scope limited to 32 bit bitmaps? Adding the
> > > > rationale
> > > > > >> > behind
> > > > > >> > > these e.g. how (if any) support of 64bit bitmaps would
> > increase
> > > > > >> > > implementation complexity. Articulating these may help other
> > > > > >> contributors
> > > > > >> > > understand the complexity and perhaps come up with
> suggestions
> > > on
> > > > > how
> > > > > >> to
> > > > > >> > > address them.
> > > > > >> > >
> > > > > >> > > Best regards
> > > > > >> > >
> > > > > >> > > Keith Lee
> > > > > >> > >
> > > > > >> > > [1]
> > > > > >> > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://fluss.apache.org/blog/releases/0.9/#2-storage-level-processing--semantics
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > On Mon, 9 Mar 2026 at 05:31, Prajwal Banakar <
> > > > > >> [email protected]
> > > > > >> > >
> > > > > >> > > wrote:
> > > > > >> > >
> > > > > >> > > > Hi Devs,
> > > > > >> > > >
> > > > > >> > > > I have pushed a working prototype to my public fork
> > > > demonstrating
> > > > > >> the
> > > > > >> > > > BitmapType integrated with FieldRoaringBitmap32Agg. This
> > > > includes
> > > > > >> four
> > > > > >> > > > passing unit tests.
> > > > > >> > > >
> > > > > >> > > > The link to the prototype is available in the Google Doc,
> > and
> > > > you
> > > > > >> can
> > > > > >> > > also
> > > > > >> > > > find it here:
> > > > > >> > > >
> > > > > >>
> > > https://github.com/Prajwal-banakar/fluss/tree/RoaringBitmap-prototype
> > > > > >> > > >
> > > > > >> > > > The Google Doc link remains the same. I look forward to
> your
> > > > > >> feedback.
> > > > > >> > > >
> > > > > >> > > > Best regards,
> > > > > >> > > >
> > > > > >> > > > Prajwal Banakar
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > On Sun, 1 Mar, 2026, 11:49 am Prajwal Banakar, <
> > > > > >> > > [email protected]
> > > > > >> > > > >
> > > > > >> > > > wrote:
> > > > > >> > > >
> > > > > >> > > > > Hi everyone,
> > > > > >> > > > >
> > > > > >> > > > > I would like to start a discussion on the proposal for
> > > Native
> > > > > >> Bitmap
> > > > > >> > > > > Integration & Stateless Pushdown Aggregation.
> > > > > >> > > > >
> > > > > >> > > > > This proposal enables end-to-end native support for the
> > > BITMAP
> > > > > >> type
> > > > > >> > in
> > > > > >> > > > > Fluss and integrates it with the existing aggregation
> > merge
> > > > > >> engine to
> > > > > >> > > > > support server-side bitmap union pushdown. The goal is
> to
> > > > reduce
> > > > > >> > > network
> > > > > >> > > > > transfer and offload DISTINCT-style aggregation from
> Flink
> > > to
> > > > > the
> > > > > >> > > > > TabletServer.
> > > > > >> > > > >
> > > > > >> > > > > Key highlights of the proposal include:
> > > > > >> > > > >
> > > > > >> > > > > - Type System: Promoting BITMAP to a first-class logical
> > > type.
> > > > > >> > > > > - UDF Suite: Introducing BITMAP_BUILD, BITMAP_OR_AGG,
> and
> > > > > >> > > > > BITMAP_CARDINALITY (aligned with FLIP-556 and StarRocks
> > > > > >> semantics).
> > > > > >> > > > > - Optimizer: Planner-based pushdown via applyAggregates
> in
> > > the
> > > > > >> Flink
> > > > > >> > > > > connector.
> > > > > >> > > > > - Safety: No changes to LogRecordBatch or WAL, making
> this
> > > > > >> strictly
> > > > > >> > > > > additive and migration-free.
> > > > > >> > > > >
> > > > > >> > > > > You can find the full proposal document here:
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1sDhfkmo-w-UTvo2n3rsY1lytSSryswfkI83cSdka8s0/edit?usp=sharing
> > > > > >> > > > >
> > > > > >> > > > > I would appreciate feedback on the public interfaces,
> > > pushdown
> > > > > >> > > > > constraints, and overall scope.
> > > > > >> > > > >
> > > > > >> > > > > Best regards,
> > > > > >> > > > > Prajwal Banakar
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to