Re: [DISCUSS] FIP-16: auto-increment column

Wang Cheng Mon, 22 Sep 2025 05:40:12 -0700

Hi Jark,


Thanks for your feedback.


1. The FIP design has been updated to indicate that the value of an 
AUTO_INCREMENT column can only be implicitly assigned.
2. I agree that a high rate of server restarts could lead to sparse allocation 
of AUTO_INCREMENT values and accelerate their exhaustion. To address this, 
enabling a tablet server to resume from its cached values after a failure would 
likely require implementing a sophisticated checkpoint &amp; restore mechanism. 
As far as I know, most database systems that rely on in-memory cached local 
IDs??such as Doris and TiDB??typically discard these cached IDs during failover 
for simplicity. I think we can refine our failure handling protocol in the 
future. What do you think?



Regards,
Cheng



&nbsp;




------------------ Original ------------------
From:                                                                           
                                             "dev"                              
                                                      <[email protected]&gt;;
Date:&nbsp;Sun, Sep 21, 2025 09:51 PM
To:&nbsp;"dev"<[email protected]&gt;;

Subject:&nbsp;Re: [DISCUSS] FIP-16: auto-increment column



Hi Cheng,

Thanks for starting this discussion. This is a very useful feature for
enabling roaringbitmaps use case.

I have two concerns regarding the current design:

*1) Explicit insertion into auto-increment columns*
I share Yuxia??s concern: allowing explicit inserts into auto-increment
columns breaks the fundamental guarantee of uniqueness. This could silently
corrupt roaringbitmaps results, and such issues would be extremely
difficult to debug.
I recommend disallowing explicit inserts unless we can guarantee
uniqueness. We can support explicit insert in the future when we have such
a solution.

*2) ID range persistence across restarts/failovers*
Currently, the ID range is cached only in memory at the tablet server. If
the server restarts or fails over, large gaps will appear in the ID
sequence. This leads to sparse roaringbitmaps and degraded the
roaringbitmaps performance. So I think we need a mechanism to persist and
restore the last allocated ID range to make the bitmap density.

Looking forward to your thoughts!

Best,
Jark




On Sun, 21 Sept 2025 at 04:56, Mehul Batra <[email protected]&gt; wrote:

&gt; Hi Cheng,
&gt;
&gt; Thanks for driving this, it's a needed feature to leap forward making the
&gt; stack production ready for real-world scenarios.
&gt; Design made sense to me, I have small questions:
&gt;
&gt; - *Cache Coordination*: When a tablet server fails and its cached IDs
&gt; (e.g., 50,000-100,000) are lost, how does ZooKeeper ensure those IDs are
&gt; never reused? Does it maintain a global highest allocated counter?
&gt; - *Cross-bucket Dependencies*: In the example, bucket 1 gets [1-100,000]
&gt; and bucket 2 gets [100,001-200,000]. What happens if tablet server
&gt; containing bucket 1 goes down permanently? Will there always be gaps in the
&gt; sequence?
&gt; - *Race Conditions*: If two Flink workers simultaneously lookup the same
&gt; non-existent primary key, could both trigger insertIfNotExists and create
&gt; duplicate auto-increment IDs? How is this prevented?
&gt; -&nbsp; How should users decide the right table.auto_inc_cache_size? 
Should we
&gt; put a max cap on this to avoid overburden
&gt;
&gt; Best Regards,
&gt; Mehul Batra
&gt;
&gt; On Fri, Sep 19, 2025 at 5:24?6?2PM Yang Wang <[email protected]&gt;
&gt; wrote:
&gt;
&gt; &gt; Hi Cheng,
&gt; &gt;
&gt; &gt; Thank you for driving this FIP. I think it is a nice and important
&gt; feature
&gt; &gt; for many real-world business scenarios, and the overall design makes
&gt; sense
&gt; &gt; to me. I have just one small question:
&gt; &gt; Regarding the client-side API design:
&gt; &gt; ```
&gt; &gt; Schema.newBuilder()
&gt; &gt;&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; .column("uid", DataTypes.STRING())
&gt; &gt;&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; .column("uid_int64", 
DataTypes.BIGINT())
&gt; &gt;&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; .enableAutoIncrement()
&gt; &gt;&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; .primaryKey("uid")
&gt; &gt;&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; .build();
&gt; &gt; ```
&gt; &gt; If there is more than one column with INT or BIGINT type, which one 
would
&gt; &gt; be the auto-increment column?
&gt; &gt;
&gt; &gt; Best regards,
&gt; &gt; Yang
&gt; &gt;
&gt; &gt; Wang Cheng <[email protected]&gt; ??2025??9??18?????? 
22:49??????
&gt; &gt;
&gt; &gt; &gt; Hi all,
&gt; &gt; &gt;
&gt; &gt; &gt;
&gt; &gt; &gt; Auto-increment column is a bread-and-butter feature for 
improving data
&gt; &gt; &gt; management efficiency. It is the bedrock of many features in 
analytical
&gt; &gt; &gt; workloads??such as those in real-time unique visitor (UV) 
counting
&gt; &gt; scenarios.
&gt; &gt; &gt;
&gt; &gt; &gt;
&gt; &gt; &gt; To implement this capability, I'd like to propose FIP-16:
&gt; auto-increment
&gt; &gt; &gt; column [1].
&gt; &gt; &gt;
&gt; &gt; &gt;
&gt; &gt; &gt; Any feedback and suggestions on this proposal are welcome!
&gt; &gt; &gt;
&gt; &gt; &gt;
&gt; &gt; &gt; [1]:
&gt; &gt; &gt;
&gt; &gt;
&gt; 
https://cwiki.apache.org/confluence/display/FLUSS/FIP-16%3A+Auto-Increment+Column
&gt; &gt; &gt;
&gt; &gt; &gt;
&gt; &gt; &gt;
&gt; &gt; &gt;
&gt; &gt; &gt; Regards,
&gt; &gt; &gt; Cheng
&gt; &gt; &gt;
&gt; &gt; &gt;
&gt; &gt; &gt;
&gt; &gt; &gt; &amp;nbsp;
&gt; &gt;
&gt;

Re: [DISCUSS] FIP-16: auto-increment column

Reply via email to