Thank you Cheng, The updates looks good to me.
Best, Jark On Mon, 27 Oct 2025 at 17:27, Wang Cheng <[email protected]> wrote: > Thank you Jark for your comments. > > > > I think we need to support INT type for AUTO_INCREMENT column, > because roaring bitmap32 is more commonly used because it is enough for > most cases and cheaper than rbm64. > > > I agree with this point. Roaring Bitmap32 is more prevalent due to its > better performance. I have revised the FIP to specify that the data type of > the AUTO_INCREMENT column must be INT or BIGINT. > > > > The znode looks like an index node for the table, would be better to > renaming it to something more descriptive would improve clarity: > /metadata/databases/[databaseName]/tables/[tableName]/auto_inc/col_[columnIdx] > > > I have renamed the znode path according to your suggestion. > > > > What happens to other columns, if this only updates other columns > except the auto-incremented column, this will be a very strange behavior > and conflict with the upsert semantic. According the the previous > discussion, we don't allow insert/upsert on the auto-incremented column, so > this should throw exception directly without checking nullability of > auto-incremented column. > > > I think there is some ambiguity in the UPSERT section. The UPSERT behavior > has been clarified as follows: "Assigning values to an AUTO_INCREMENT > column during UPSERT operations is prohibited, irrespective of target row > existence. If the row is newly inserted into the table, Fluss fills the > AUTO_INCREMENT column with a new auto-incremented ID." > > > > I suggest not introducing the `AutoIncIDBuffer` in the first version, > or at least do not introduce these 2 options as they become public API. > Because, this may conflict with the persisted ID approach that we will soon > introduce. > > > I have removed those 2 options from the FIP considering that it may not be > compatible with our upcoming bucket state snapshot feature. I have also > revised the failover handling section accordingly: "To eliminate gaps > during failover, local cached IDs will be persisted as part of the upcoming > bucket state snapshot, ensuring their availability after server restart." > > > > > > Regards, > Cheng > > > > > > > > > ------------------ Original ------------------ > From: > "dev" > < > [email protected]>; > Date: Sun, Oct 26, 2025 04:33 PM > To: "dev"<[email protected]>; > > Subject: Re: [DISCUSS] FIP-16: auto-increment column > > > > Hi Cheng, > > Sorry for interrupting the vote. However, I spotted some issues we may > need to address. > > > The data type of the AUTO_INCREMENT column must be BIGINT. > > I think we need to support INT type for AUTO_INCREMENT column, because > roaringbitmap32 is more commonly used because it is enough for most cases > and cheaper than rbm64. > > > For UPSERT operations, the following situations occur: If the row > already > exists in the table, Fluss does not update the auto-incremented ID. > > What happens to other columns, if this only updates other columns except > the auto-incremented column, this will be a very strange behavior and > conflict with the upsert semantic. > According the the previous discussion, we don't allow insert/upsert on the > auto-incremented column, so this should throw exception directly without > checking nullability of auto-incremented column. > > > ZooKeeper with the znode path being > > /metadata/databases/[databaseName]/tables/[tableName]/autoinc/idx_[columnIdx] > > The znode looks like an index node for the table, would be better to > renaming it to something more descriptive would improve clarity: > > /metadata/databases/[databaseName]/tables/[tableName]/auto_inc/col_[columnIdx] > > > The prefetch batch size and low watermark ratio are controlled by > configuration parameters table.auto_inc_cache_size and > table.auto_inc_low_water_mark_size_ratio respectively. > > I suggest not introducing the `AutoIncIDBuffer` in the first version, or at > least do not introduce these 2 options as they become public API. Because, > this may conflict with the persisted ID approach that we will soon > introduce. > > > Best, > Jark > > On Tue, 23 Sept 2025 at 10:13, Wang Cheng <[email protected]> > wrote: > > > Hi&nbsp;Giannis, > > > > > > Thanks for your comments. > > > > > > 1. That makes sense. I'll update the enableAutoIncrement() method to > > accept the column name as a parameter. > > 2. Once the local cached IDs are used up, the bucket will request a > new > > batch from ZooKeeper. > > 3. The default cache size 100,000 is inspired by the modern OLAP > database > > StarRocks, which should suffice for most use cases. I think we can > add a > > note suggesting that table with high-frequency inserts should set a > larger > > number for better performance.&nbsp; > > > > > > > > Regards, > > Cheng > > > > > > > > &nbsp; > > > > > > > > > > ------------------&nbsp;Original&nbsp;------------------ > > From: > > > "dev" > > > < > > [email protected]&gt;; > > Date:&nbsp;Mon, Sep 22, 2025 10:51 PM > > To:&nbsp;"dev"<[email protected]&gt;; > > > > Subject:&nbsp;Re: [DISCUSS] FIP-16: auto-increment column > > > > > > > > Hi Cheng and thank you for driving this 🙏 > > > > My first question in terms of the API design is also > > If the API .enableAutoIncrement() should take as argument the column > name, > > so it’s more intuitive and clear. > > > > My extra comments are: > > 1. What happens if a bucket reaches its threshold? i.e has a key > range [1, > > 100.000] and hits the upper bound? (If it’s mentioned and i missed it, > > please ignore my comment) > > > > 2. Based on my experience with Paimon, the record number (depending > on the > > record size) might range between 1-10million records. In most of my > > experiments, with autoscaling buckets, i always had a 1million rows > per > > bucket. So I’m thinking maybe it’s better to make the default > threshold > > larger. > > > > Best, > > Giannis > > > > On Mon, 22 Sep 2025 at 3:41 PM, Wang Cheng <[email protected] > &gt; > > wrote: > > > > &gt; Hi Mehul, > > &gt; > > &gt; > > &gt; Thanks for your comments. > > &gt; > > &gt; > > &gt; 1. When a tablet servers restarts, its in-memory local > cached IDs are > > &gt; lost. It will then invoke the add [1] method of ZooKeeper > > &gt; DistributedAtomicLong to request a new batch of IDs. > ZooKeeper > > &gt; DistributedAtomicLong acts as a globally synchronized > counter > > that only > > &gt; issues monotonically increasing values. If values of > > DistributedAtomicLong > > &gt; are exhausted, an error will be thrown. > > &gt; 2. Yes, if the tablet server holding bucket 1 (range > 1–100,000) fails > > &gt; permanently, those cached but unused IDs are lost forever, > creating > > gaps in > > &gt; the sequence. As highlighted in the proposal under > "monotonicity", > > Fluss > > &gt; does not guarantee that the values for the AUTO_INCREMENT > column are > > &gt; strictly monotonic to prioritize performance and simplicity. > It can > > only be > > &gt; ensured that the values roughly increase in chronological > order. > > &gt; 3. In your scenario, once both requests confirm that the > target > > primary > > &gt; key does not exist, they will proceed to initiate an insert > operation. > > &gt; However, a write lock in the insertion path acts as a > safeguard > > against > > &gt; concurrent write conflicts. Crucially, after a request > successfully > > &gt; acquires the write lock, it must recheck the existence of > the primary > > key > > &gt; once more before proceeding with the actual insert. This > two-step > > &gt; verification coupled with the write lock ensures that only > one > > request can > > &gt; ultimately complete the insertion, thereby preventing the > generation > > of > > &gt; duplicate auto-increment IDs. > > &gt; 4. The cache size should be tuned based on insert volume. For > > &gt; high-frequency insert operations, a larger cache is > recommended for > > optimal > > &gt; performance. > > &gt; > > &gt; > > &gt; [1] > > &gt; > > > https://curator.apache.org/apidocs/org/apache/curator/framework/recipes/atomic/DistributedAtomicLong.html#add(java.lang.Long) > > > <https://curator.apache.org/apidocs/org/apache/curator/framework/recipes/atomic/DistributedAtomicLong.html#add(java.lang.Long)>>; > &gt > > < > https://curator.apache.org/apidocs/org/apache/curator/framework/recipes/atomic/DistributedAtomicLong.html#add(java.lang.Long)&gt> > ; > > ; > > &gt; Regards, > > &gt; Cheng > > &gt; > > &gt; > > &gt; > > &gt; &amp;nbsp; > > &gt; > > &gt; > > &gt; > > &gt; > > &gt; ------------------ Original ------------------ > > &gt; From: > > > &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; > > "dev" > > > &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; > > < > > &gt; [email protected]&amp;gt;; > > &gt; Date:&amp;nbsp;Sun, Sep 21, 2025 04:55 AM > > &gt; To:&amp;nbsp;"dev"<[email protected]&amp;gt;; > > &gt; > > &gt; Subject:&amp;nbsp;Re: [DISCUSS] FIP-16: auto-increment > column > > &gt; > > &gt; > > &gt; > > &gt; Hi Cheng, > > &gt; > > &gt; Thanks for driving this, it's a needed feature to leap > forward making > > the > > &gt; stack production ready for real-world scenarios. > > &gt; Design made sense to me, I have small questions: > > &gt; > > &gt; - *Cache Coordination*: When a tablet server fails and its > cached IDs > > &gt; (e.g., 50,000-100,000) are lost, how does ZooKeeper ensure > those IDs > > are > > &gt; never reused? Does it maintain a global highest allocated > counter? > > &gt; - *Cross-bucket Dependencies*: In the example, bucket 1 gets > > [1-100,000] > > &gt; and bucket 2 gets [100,001-200,000]. What happens if tablet > server > > &gt; containing bucket 1 goes down permanently? Will there always > be gaps > > in the > > &gt; sequence? > > &gt; - *Race Conditions*: If two Flink workers simultaneously > lookup the > > same > > &gt; non-existent primary key, could both trigger > insertIfNotExists and > > create > > &gt; duplicate auto-increment IDs? How is this prevented? > > &gt; -&amp;nbsp; How should users decide the right > > table.auto_inc_cache_size? > > &gt; Should we > > &gt; put a max cap on this to avoid overburden > > &gt; > > &gt; Best Regards, > > &gt; Mehul Batra > > &gt; > > &gt; On Fri, Sep 19, 2025 at 5:24 PM Yang Wang < > [email protected] > > &amp;gt; > > &gt; wrote: > > &gt; > > &gt; &amp;gt; Hi Cheng, > > &gt; &amp;gt; > > &gt; &amp;gt; Thank you for driving this FIP. I think it is a > nice and > > important > > &gt; feature > > &gt; &amp;gt; for many real-world business scenarios, and the > overall > > design makes > > &gt; sense > > &gt; &amp;gt; to me. I have just one small question: > > &gt; &amp;gt; Regarding the client-side API design: > > &gt; &amp;gt; ``` > > &gt; &amp;gt; Schema.newBuilder() > > &gt; &amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; > &amp;nbsp; .column("uid", > > DataTypes.STRING()) > > &gt; &amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; > &amp;nbsp; > > .column("uid_int64", DataTypes.BIGINT()) > > &gt; &amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; > &amp;nbsp; > > .enableAutoIncrement() > > &gt; &amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; > &amp;nbsp; > > .primaryKey("uid") > > &gt; &amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; > &amp;nbsp; .build(); > > &gt; &amp;gt; ``` > > &gt; &amp;gt; If there is more than one column with INT or > BIGINT type, > > which one > > &gt; would > > &gt; &amp;gt; be the auto-increment column? > > &gt; &amp;gt; > > &gt; &amp;gt; Best regards, > > &gt; &amp;gt; Yang > > &gt; &amp;gt; > > &gt; &amp;gt; Wang Cheng <[email protected]&amp;gt; > 于2025年9月18日周四 > > 22:49写道: > > &gt; &amp;gt; > > &gt; &amp;gt; &amp;gt; Hi all, > > &gt; &amp;gt; &amp;gt; > > &gt; &amp;gt; &amp;gt; > > &gt; &amp;gt; &amp;gt; Auto-increment column is a > bread-and-butter feature > > for > > &gt; improving data > > &gt; &amp;gt; &amp;gt; management efficiency. It is the > bedrock of many > > features in > > &gt; analytical > > &gt; &amp;gt; &amp;gt; workloads—such as those in > real-time unique visitor > > (UV) counting > > &gt; &amp;gt; scenarios. > > &gt; &amp;gt; &amp;gt; > > &gt; &amp;gt; &amp;gt; > > &gt; &amp;gt; &amp;gt; To implement this capability, I'd > like to propose > > FIP-16: > > &gt; auto-increment > > &gt; &amp;gt; &amp;gt; column [1]. > > &gt; &amp;gt; &amp;gt; > > &gt; &amp;gt; &amp;gt; > > &gt; &amp;gt; &amp;gt; Any feedback and suggestions on > this proposal are > > welcome! > > &gt; &amp;gt; &amp;gt; > > &gt; &amp;gt; &amp;gt; > > &gt; &amp;gt; &amp;gt; [1]: > > &gt; &amp;gt; &amp;gt; > > &gt; &amp;gt; > > &gt; > > > https://cwiki.apache.org/confluence/display/FLUSS/FIP-16%3A+Auto-Increment+Column > > > <https://cwiki.apache.org/confluence/display/FLUSS/FIP-16%3A+Auto-Increment+Column>>; > &gt > > < > https://cwiki.apache.org/confluence/display/FLUSS/FIP-16%3A+Auto-Increment+Column&gt> > ;; > > &amp;gt > > &gt; < > > > https://cwiki.apache.org/confluence/display/FLUSS/FIP-16%3A+Auto-Increment+Column&amp;gt&gt > > > <https://cwiki.apache.org/confluence/display/FLUSS/FIP-16%3A+Auto-Increment+Column&amp;gt&gt>>; > ;; > > &gt; &amp;gt; > > &gt; &amp;gt; &amp;gt; > > &gt; &amp;gt; &amp;gt; > > &gt; &amp;gt; &amp;gt; > > &gt; &amp;gt; &amp;gt; Regards, > > &gt; &amp;gt; &amp;gt; Cheng > > &gt; &amp;gt; &amp;gt; > > &gt; &amp;gt; &amp;gt; > > &gt; &amp;gt; &amp;gt; > > &gt; &amp;gt; &amp;gt; &amp;amp;nbsp; > > &gt; &amp;gt;
