Re: [I] Support for Identity Columns in Apache Iceberg [iceberg]

via GitHub Mon, 17 Feb 2025 10:33:39 -0800


RussellSpitzer commented on issue #12297:
URL: https://github.com/apache/iceberg/issues/12297#issuecomment-2663858540


   I'm not quite sure how "proposed implementation" actually would work. We
   probably need some actual details here. I would recommend starting a new
   design doc. The proposal should also integrate with the already existing
   concept of https://iceberg.apache.org/spec/#identifier-field-ids .
   
   The concept of auto/incrementing or generating of values for rows probably
   also needs considerably more discussion and probably it's own design doc.
   
   On Mon, Feb 17, 2025 at 1:48 AM Nguyễn Quốc Vương ***@***.***>
   wrote:
   
   > Feature Request / Improvement
   >
   > *Summary*:
   > Apache Iceberg should support identity columns similar to Delta Lake. This
   > feature would allow users to define identity columns in Iceberg tables,
   > where unique values are automatically generated when not explicitly
   > provided during writes.
   >
   > *Motivation*:
   > Currently, Apache Iceberg does not provide built-in support for identity
   > columns. In contrast, Delta Lake allows defining identity columns that
   > generate unique values when users do not explicitly provide them. This
   > feature simplifies the handling of primary keys and auto-incrementing IDs
   > in use cases such as:
   >
   >    -
   >
   >    Maintaining unique row identifiers in tables without requiring
   >    external sequence management.
   >    -
   >
   >    Enabling better support for incremental ingestion scenarios where
   >    records require unique IDs.
   >    -
   >
   >    Reducing complexity for users transitioning from traditional databases
   >    that support auto-incrementing primary keys.
   >
   > *Proposed Implementation*:
   >
   >    -
   >
   >    Introduce a new table property (e.g., identity.column=true) to enable
   >    identity columns on specific fields.
   >    -
   >
   >    Define syntax for identity column declaration during table creation
   >    (e.g., CREATE TABLE ... (id BIGINT IDENTITY, name STRING)).
   >    -
   >
   >    Implement automatic value generation for identity columns when an
   >    explicit value is not provided.
   >    -
   >
   >    Ensure compatibility with Iceberg’s partitioning, snapshot isolation,
   >    and metadata management.
   >
   > *Alternatives Considered*:
   >
   >    -
   >
   >    Using externally managed sequences or UUIDs, but these approaches
   >    introduce additional complexity and overhead.
   >    -
   >
   >    Leveraging application-side logic to generate unique values, which is
   >    not as efficient as native support.
   >
   > Query engine
   >
   > None
   > Willingness to contribute
   >
   >    - I can contribute this improvement/feature independently
   >    - I would be willing to contribute this improvement/feature with
   >    guidance from the Iceberg community
   >    - I cannot contribute this improvement/feature at this time
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/iceberg/issues/12297>, or unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/AADE2YJZCUQUQVJCKM2CTAD2QGH6RAVCNFSM6AAAAABXIVLCDSVHI2DSMVQWIX3LMV43ASLTON2WKOZSHA2TOMBUGU4DGNQ>
   > .
   > You are receiving this because you are subscribed to this thread.Message
   > ID: ***@***.***>
   > [image: nqvuong1998]*nqvuong1998* created an issue (apache/iceberg#12297)
   > <https://github.com/apache/iceberg/issues/12297>
   > Feature Request / Improvement
   >
   > *Summary*:
   > Apache Iceberg should support identity columns similar to Delta Lake. This
   > feature would allow users to define identity columns in Iceberg tables,
   > where unique values are automatically generated when not explicitly
   > provided during writes.
   >
   > *Motivation*:
   > Currently, Apache Iceberg does not provide built-in support for identity
   > columns. In contrast, Delta Lake allows defining identity columns that
   > generate unique values when users do not explicitly provide them. This
   > feature simplifies the handling of primary keys and auto-incrementing IDs
   > in use cases such as:
   >
   >    -
   >
   >    Maintaining unique row identifiers in tables without requiring
   >    external sequence management.
   >    -
   >
   >    Enabling better support for incremental ingestion scenarios where
   >    records require unique IDs.
   >    -
   >
   >    Reducing complexity for users transitioning from traditional databases
   >    that support auto-incrementing primary keys.
   >
   > *Proposed Implementation*:
   >
   >    -
   >
   >    Introduce a new table property (e.g., identity.column=true) to enable
   >    identity columns on specific fields.
   >    -
   >
   >    Define syntax for identity column declaration during table creation
   >    (e.g., CREATE TABLE ... (id BIGINT IDENTITY, name STRING)).
   >    -
   >
   >    Implement automatic value generation for identity columns when an
   >    explicit value is not provided.
   >    -
   >
   >    Ensure compatibility with Iceberg’s partitioning, snapshot isolation,
   >    and metadata management.
   >
   > *Alternatives Considered*:
   >
   >    -
   >
   >    Using externally managed sequences or UUIDs, but these approaches
   >    introduce additional complexity and overhead.
   >    -
   >
   >    Leveraging application-side logic to generate unique values, which is
   >    not as efficient as native support.
   >
   > Query engine
   >
   > None
   > Willingness to contribute
   >
   >    - I can contribute this improvement/feature independently
   >    - I would be willing to contribute this improvement/feature with
   >    guidance from the Iceberg community
   >    - I cannot contribute this improvement/feature at this time
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/iceberg/issues/12297>, or unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/AADE2YJZCUQUQVJCKM2CTAD2QGH6RAVCNFSM6AAAAABXIVLCDSVHI2DSMVQWIX3LMV43ASLTON2WKOZSHA2TOMBUGU4DGNQ>
   > .
   > You are receiving this because you are subscribed to this thread.Message
   > ID: ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Support for Identity Columns in Apache Iceberg [iceberg]

Reply via email to