Re: [DISCUSS] FIP-16: auto-increment column

Jark Wu Sun, 26 Oct 2025 01:34:28 -0700

Hi Cheng,

Sorry for interrupting the vote. However, I spotted some issues we may
need to address.


> The data type of the AUTO_INCREMENT column must be BIGINT.

I think we need to support INT type for AUTO_INCREMENT column, because
roaringbitmap32 is more commonly used because it is enough for most cases
and cheaper than rbm64.

> For UPSERT operations, the following situations occur: If the row already
exists in the table, Fluss does not update the auto-incremented ID.

What happens to other columns, if this only updates other columns except
the auto-incremented column, this will be a very strange behavior and
conflict with the upsert semantic.
According the the previous discussion, we don't allow insert/upsert on the
auto-incremented column, so this should throw exception directly without
checking nullability of auto-incremented column.

> ZooKeeper with the znode path being
/metadata/databases/[databaseName]/tables/[tableName]/autoinc/idx_[columnIdx]

The znode looks like an index node for the table, would be better to
renaming it to something more descriptive would improve clarity:
/metadata/databases/[databaseName]/tables/[tableName]/auto_inc/col_[columnIdx]

> The prefetch batch size and low watermark ratio are controlled by
configuration parameters table.auto_inc_cache_size and
table.auto_inc_low_water_mark_size_ratio respectively.

I suggest not introducing the `AutoIncIDBuffer` in the first version, or at
least do not introduce these 2 options as they become public API. Because,
this may conflict with the persisted ID approach that we will soon
introduce.


Best,
Jark

On Tue, 23 Sept 2025 at 10:13, Wang Cheng <[email protected]> wrote:

> Hi&nbsp;Giannis,
>
>
> Thanks for your comments.
>
>
> 1. That makes sense. I'll update the enableAutoIncrement() method to
> accept the column name as a parameter.
> 2. Once the local cached IDs are used up, the bucket will request a new
> batch from ZooKeeper.
> 3. The default cache size 100,000 is inspired by the modern OLAP database
> StarRocks, which should suffice for most use cases. I think we can add a
> note suggesting that table with high-frequency inserts should set a larger
> number for better performance.&nbsp;
>
>
>
> Regards,
> Cheng
>
>
>
> &nbsp;
>
>
>
>
> ------------------&nbsp;Original&nbsp;------------------
> From:
>                                                   "dev"
>                                                                 <
> [email protected]&gt;;
> Date:&nbsp;Mon, Sep 22, 2025 10:51 PM
> To:&nbsp;"dev"<[email protected]&gt;;
>
> Subject:&nbsp;Re: [DISCUSS] FIP-16: auto-increment column
>
>
>
> Hi Cheng and thank you for driving this 🙏
>
> My first question in terms of the API design is also
> If the API .enableAutoIncrement() should take as argument the column name,
> so it’s more intuitive and clear.
>
> My extra comments are:
> 1. What happens if a bucket reaches its threshold? i.e has a key range [1,
> 100.000] and hits the upper bound? (If it’s mentioned and i missed it,
> please ignore my comment)
>
> 2. Based on my experience with Paimon, the record number (depending on the
> record size) might range between 1-10million records. In most of my
> experiments, with autoscaling buckets, i always had a 1million rows per
> bucket. So I’m thinking maybe it’s better to make the default threshold
> larger.
>
> Best,
> Giannis
>
> On Mon, 22 Sep 2025 at 3:41 PM, Wang Cheng <[email protected]&gt;
> wrote:
>
> &gt; Hi Mehul,
> &gt;
> &gt;
> &gt; Thanks for your comments.
> &gt;
> &gt;
> &gt; 1. When a tablet servers restarts, its in-memory local cached IDs are
> &gt; lost. It will then invoke the add [1] method of ZooKeeper
> &gt; DistributedAtomicLong to request a new batch of IDs. ZooKeeper
> &gt; DistributedAtomicLong acts as a globally synchronized counter
> that only
> &gt; issues monotonically increasing values. If values of
> DistributedAtomicLong
> &gt; are exhausted, an error will be thrown.
> &gt; 2. Yes, if the tablet server holding bucket 1 (range 1–100,000) fails
> &gt; permanently, those cached but unused IDs are lost forever, creating
> gaps in
> &gt; the sequence. As highlighted in the proposal under "monotonicity",
> Fluss
> &gt; does not guarantee that the values for the AUTO_INCREMENT column are
> &gt; strictly monotonic to prioritize performance and simplicity. It can
> only be
> &gt; ensured that the values roughly increase in chronological order.
> &gt; 3. In your scenario, once both requests confirm that the target
> primary
> &gt; key does not exist, they will proceed to initiate an insert operation.
> &gt; However, a write lock in the insertion path acts as a safeguard
> against
> &gt; concurrent write conflicts. Crucially, after a request successfully
> &gt; acquires the write lock, it must recheck the existence of the primary
> key
> &gt; once more before proceeding with the actual insert. This two-step
> &gt; verification coupled with the write lock ensures that only one
> request can
> &gt; ultimately complete the insertion, thereby preventing the generation
> of
> &gt; duplicate auto-increment IDs.
> &gt; 4. The cache size should be tuned based on insert volume. For
> &gt; high-frequency insert operations, a larger cache is recommended for
> optimal
> &gt; performance.
> &gt;
> &gt;
> &gt; [1]
> &gt;
> https://curator.apache.org/apidocs/org/apache/curator/framework/recipes/atomic/DistributedAtomicLong.html#add(java.lang.Long)
> &gt
> <https://curator.apache.org/apidocs/org/apache/curator/framework/recipes/atomic/DistributedAtomicLong.html#add(java.lang.Long)&gt>
> ;
> &gt; Regards,
> &gt; Cheng
> &gt;
> &gt;
> &gt;
> &gt; &amp;nbsp;
> &gt;
> &gt;
> &gt;
> &gt;
> &gt; ------------------ Original ------------------
> &gt; From:
> &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> "dev"
> &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> <
> &gt; [email protected]&amp;gt;;
> &gt; Date:&amp;nbsp;Sun, Sep 21, 2025 04:55 AM
> &gt; To:&amp;nbsp;"dev"<[email protected]&amp;gt;;
> &gt;
> &gt; Subject:&amp;nbsp;Re: [DISCUSS] FIP-16: auto-increment column
> &gt;
> &gt;
> &gt;
> &gt; Hi Cheng,
> &gt;
> &gt; Thanks for driving this, it's a needed feature to leap forward making
> the
> &gt; stack production ready for real-world scenarios.
> &gt; Design made sense to me, I have small questions:
> &gt;
> &gt; - *Cache Coordination*: When a tablet server fails and its cached IDs
> &gt; (e.g., 50,000-100,000) are lost, how does ZooKeeper ensure those IDs
> are
> &gt; never reused? Does it maintain a global highest allocated counter?
> &gt; - *Cross-bucket Dependencies*: In the example, bucket 1 gets
> [1-100,000]
> &gt; and bucket 2 gets [100,001-200,000]. What happens if tablet server
> &gt; containing bucket 1 goes down permanently? Will there always be gaps
> in the
> &gt; sequence?
> &gt; - *Race Conditions*: If two Flink workers simultaneously lookup the
> same
> &gt; non-existent primary key, could both trigger insertIfNotExists and
> create
> &gt; duplicate auto-increment IDs? How is this prevented?
> &gt; -&amp;nbsp; How should users decide the right
> table.auto_inc_cache_size?
> &gt; Should we
> &gt; put a max cap on this to avoid overburden
> &gt;
> &gt; Best Regards,
> &gt; Mehul Batra
> &gt;
> &gt; On Fri, Sep 19, 2025 at 5:24 PM Yang Wang <[email protected]
> &amp;gt;
> &gt; wrote:
> &gt;
> &gt; &amp;gt; Hi Cheng,
> &gt; &amp;gt;
> &gt; &amp;gt; Thank you for driving this FIP. I think it is a nice and
> important
> &gt; feature
> &gt; &amp;gt; for many real-world business scenarios, and the overall
> design makes
> &gt; sense
> &gt; &amp;gt; to me. I have just one small question:
> &gt; &amp;gt; Regarding the client-side API design:
> &gt; &amp;gt; ```
> &gt; &amp;gt; Schema.newBuilder()
> &gt; &amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .column("uid",
> DataTypes.STRING())
> &gt; &amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;
> .column("uid_int64", DataTypes.BIGINT())
> &gt; &amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;
> .enableAutoIncrement()
> &gt; &amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;
> .primaryKey("uid")
> &gt; &amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .build();
> &gt; &amp;gt; ```
> &gt; &amp;gt; If there is more than one column with INT or BIGINT type,
> which one
> &gt; would
> &gt; &amp;gt; be the auto-increment column?
> &gt; &amp;gt;
> &gt; &amp;gt; Best regards,
> &gt; &amp;gt; Yang
> &gt; &amp;gt;
> &gt; &amp;gt; Wang Cheng <[email protected]&amp;gt; 于2025年9月18日周四
> 22:49写道：
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt; Hi all,
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt; &amp;gt; Auto-increment column is a bread-and-butter feature
> for
> &gt; improving data
> &gt; &amp;gt; &amp;gt; management efficiency. It is the bedrock of many
> features in
> &gt; analytical
> &gt; &amp;gt; &amp;gt; workloads—such as those in real-time unique visitor
> (UV) counting
> &gt; &amp;gt; scenarios.
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt; &amp;gt; To implement this capability, I'd like to propose
> FIP-16:
> &gt; auto-increment
> &gt; &amp;gt; &amp;gt; column [1].
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt; &amp;gt; Any feedback and suggestions on this proposal are
> welcome!
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt; &amp;gt; [1]:
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt;
> &gt;
> https://cwiki.apache.org/confluence/display/FLUSS/FIP-16%3A+Auto-Increment+Column
> &gt
> <https://cwiki.apache.org/confluence/display/FLUSS/FIP-16%3A+Auto-Increment+Column&gt>;
> &amp;gt
> &gt; <
> https://cwiki.apache.org/confluence/display/FLUSS/FIP-16%3A+Auto-Increment+Column&amp;gt&gt
> ;;
> &gt; &amp;gt;
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt; &amp;gt; Regards,
> &gt; &amp;gt; &amp;gt; Cheng
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt; &amp;gt;
> &gt; &amp;gt; &amp;gt; &amp;amp;nbsp;
> &gt; &amp;gt;

Re: [DISCUSS] FIP-16: auto-increment column

Reply via email to