Re: [DISCUSS] FIP-16: auto-increment column

Wang Cheng Mon, 22 Sep 2025 05:43:48 -0700

Hi Mehul,


Thanks for your comments.


1. When a tablet servers restarts, its in-memory local cached IDs are lost. It 
will then invoke the add [1] method of ZooKeeper DistributedAtomicLong to 
request a new batch of IDs. ZooKeeper DistributedAtomicLong acts as a 
?6?7?6?7globally synchronized counter?6?7?6?7 that only issues monotonically 
increasing values. If values of DistributedAtomicLong are exhausted, an error 
will be thrown.
2. Yes, if the tablet server holding bucket 1 (range 1?C100,000) fails 
permanently, those cached but unused IDs are lost forever, creating gaps in the 
sequence. As highlighted in the proposal under "monotonicity", Fluss does not 
guarantee that the values for the AUTO_INCREMENT column are strictly monotonic 
to prioritize performance and simplicity. It can only be ensured that the 
values roughly increase in chronological order.
3. In your scenario, once both requests confirm that the target primary key 
does not exist, they will proceed to initiate an insert operation. However, a 
write lock in the insertion path acts as a safeguard against concurrent write 
conflicts. Crucially, after a request successfully acquires the write lock, it 
must recheck the existence of the primary key once more before proceeding with 
the actual insert. This two-step verification coupled with the write lock 
ensures that only one request can ultimately complete the insertion, thereby 
preventing the generation of duplicate auto-increment IDs.
4. The cache size should be tuned based on insert volume. For high-frequency 
insert operations, a larger cache is recommended for optimal performance.


[1] 
https://curator.apache.org/apidocs/org/apache/curator/framework/recipes/atomic/DistributedAtomicLong.html#add(java.lang.Long)

Regards,
Cheng



&nbsp;




------------------ Original ------------------
From:                                                                           
                                             "dev"                              
                                                      
<[email protected]&gt;;
Date:&nbsp;Sun, Sep 21, 2025 04:55 AM
To:&nbsp;"dev"<[email protected]&gt;;

Subject:&nbsp;Re: [DISCUSS] FIP-16: auto-increment column



Hi Cheng,

Thanks for driving this, it's a needed feature to leap forward making the
stack production ready for real-world scenarios.
Design made sense to me, I have small questions:

- *Cache Coordination*: When a tablet server fails and its cached IDs
(e.g., 50,000-100,000) are lost, how does ZooKeeper ensure those IDs are
never reused? Does it maintain a global highest allocated counter?
- *Cross-bucket Dependencies*: In the example, bucket 1 gets [1-100,000]
and bucket 2 gets [100,001-200,000]. What happens if tablet server
containing bucket 1 goes down permanently? Will there always be gaps in the
sequence?
- *Race Conditions*: If two Flink workers simultaneously lookup the same
non-existent primary key, could both trigger insertIfNotExists and create
duplicate auto-increment IDs? How is this prevented?
-&nbsp; How should users decide the right table.auto_inc_cache_size? Should we
put a max cap on this to avoid overburden

Best Regards,
Mehul Batra

On Fri, Sep 19, 2025 at 5:24?6?2PM Yang Wang <[email protected]&gt; 
wrote:

&gt; Hi Cheng,
&gt;
&gt; Thank you for driving this FIP. I think it is a nice and important feature
&gt; for many real-world business scenarios, and the overall design makes sense
&gt; to me. I have just one small question:
&gt; Regarding the client-side API design:
&gt; ```
&gt; Schema.newBuilder()
&gt; &nbsp; &nbsp; &nbsp; &nbsp; .column("uid", DataTypes.STRING())
&gt; &nbsp; &nbsp; &nbsp; &nbsp; .column("uid_int64", DataTypes.BIGINT())
&gt; &nbsp; &nbsp; &nbsp; &nbsp; .enableAutoIncrement()
&gt; &nbsp; &nbsp; &nbsp; &nbsp; .primaryKey("uid")
&gt; &nbsp; &nbsp; &nbsp; &nbsp; .build();
&gt; ```
&gt; If there is more than one column with INT or BIGINT type, which one would
&gt; be the auto-increment column?
&gt;
&gt; Best regards,
&gt; Yang
&gt;
&gt; Wang Cheng <[email protected]&gt; ??2025??9??18?????? 22:49??????
&gt;
&gt; &gt; Hi all,
&gt; &gt;
&gt; &gt;
&gt; &gt; Auto-increment column is a bread-and-butter feature for improving data
&gt; &gt; management efficiency. It is the bedrock of many features in 
analytical
&gt; &gt; workloads??such as those in real-time unique visitor (UV) counting
&gt; scenarios.
&gt; &gt;
&gt; &gt;
&gt; &gt; To implement this capability, I'd like to propose FIP-16: 
auto-increment
&gt; &gt; column [1].
&gt; &gt;
&gt; &gt;
&gt; &gt; Any feedback and suggestions on this proposal are welcome!
&gt; &gt;
&gt; &gt;
&gt; &gt; [1]:
&gt; &gt;
&gt; 
https://cwiki.apache.org/confluence/display/FLUSS/FIP-16%3A+Auto-Increment+Column
&gt; &gt;
&gt; &gt;
&gt; &gt;
&gt; &gt;
&gt; &gt; Regards,
&gt; &gt; Cheng
&gt; &gt;
&gt; &gt;
&gt; &gt;
&gt; &gt; &amp;nbsp;
&gt;

Re: [DISCUSS] FIP-16: auto-increment column

Reply via email to