Hi Cheng,

Thanks for starting this discussion. This is a very useful feature for
enabling roaringbitmaps use case.

I have two concerns regarding the current design:

*1) Explicit insertion into auto-increment columns*
I share Yuxia’s concern: allowing explicit inserts into auto-increment
columns breaks the fundamental guarantee of uniqueness. This could silently
corrupt roaringbitmaps results, and such issues would be extremely
difficult to debug.
I recommend disallowing explicit inserts unless we can guarantee
uniqueness. We can support explicit insert in the future when we have such
a solution.

*2) ID range persistence across restarts/failovers*
Currently, the ID range is cached only in memory at the tablet server. If
the server restarts or fails over, large gaps will appear in the ID
sequence. This leads to sparse roaringbitmaps and degraded the
roaringbitmaps performance. So I think we need a mechanism to persist and
restore the last allocated ID range to make the bitmap density.

Looking forward to your thoughts!

Best,
Jark




On Sun, 21 Sept 2025 at 04:56, Mehul Batra <[email protected]> wrote:

> Hi Cheng,
>
> Thanks for driving this, it's a needed feature to leap forward making the
> stack production ready for real-world scenarios.
> Design made sense to me, I have small questions:
>
> - *Cache Coordination*: When a tablet server fails and its cached IDs
> (e.g., 50,000-100,000) are lost, how does ZooKeeper ensure those IDs are
> never reused? Does it maintain a global highest allocated counter?
> - *Cross-bucket Dependencies*: In the example, bucket 1 gets [1-100,000]
> and bucket 2 gets [100,001-200,000]. What happens if tablet server
> containing bucket 1 goes down permanently? Will there always be gaps in the
> sequence?
> - *Race Conditions*: If two Flink workers simultaneously lookup the same
> non-existent primary key, could both trigger insertIfNotExists and create
> duplicate auto-increment IDs? How is this prevented?
> -  How should users decide the right table.auto_inc_cache_size? Should we
> put a max cap on this to avoid overburden
>
> Best Regards,
> Mehul Batra
>
> On Fri, Sep 19, 2025 at 5:24 PM Yang Wang <[email protected]>
> wrote:
>
> > Hi Cheng,
> >
> > Thank you for driving this FIP. I think it is a nice and important
> feature
> > for many real-world business scenarios, and the overall design makes
> sense
> > to me. I have just one small question:
> > Regarding the client-side API design:
> > ```
> > Schema.newBuilder()
> >         .column("uid", DataTypes.STRING())
> >         .column("uid_int64", DataTypes.BIGINT())
> >         .enableAutoIncrement()
> >         .primaryKey("uid")
> >         .build();
> > ```
> > If there is more than one column with INT or BIGINT type, which one would
> > be the auto-increment column?
> >
> > Best regards,
> > Yang
> >
> > Wang Cheng <[email protected]> 于2025年9月18日周四 22:49写道:
> >
> > > Hi all,
> > >
> > >
> > > Auto-increment column is a bread-and-butter feature for improving data
> > > management efficiency. It is the bedrock of many features in analytical
> > > workloads—such as those in real-time unique visitor (UV) counting
> > scenarios.
> > >
> > >
> > > To implement this capability, I'd like to propose FIP-16:
> auto-increment
> > > column [1].
> > >
> > >
> > > Any feedback and suggestions on this proposal are welcome!
> > >
> > >
> > > [1]:
> > >
> >
> https://cwiki.apache.org/confluence/display/FLUSS/FIP-16%3A+Auto-Increment+Column
> > >
> > >
> > >
> > >
> > > Regards,
> > > Cheng
> > >
> > >
> > >
> > > &nbsp;
> >
>

Reply via email to