Re: [I] Create Iceberg Table from pyarrow Schema with no IDs [iceberg-python]

via GitHub Wed, 09 Apr 2025 10:14:13 -0700


0x26res commented on issue #278:
URL: https://github.com/apache/iceberg-python/issues/278#issuecomment-2790415821


   Sorry I'm not sure if this is the right place to ask this question.
   
   My understanding from this conversation is that when a user provides a 
`pa.Schema` to create an iceberg table, field_ids gets "refreshed" (aka, we 
assign monotonically increasing field_id).
   
   But it seems to me that actually, field_id are refreshed no matter what one 
does, and in particular when the use provide explicit values for field_ids:
   
   ```pyhon
   from pyiceberg.catalog.memory import InMemoryCatalog
   from pyiceberg.schema import Schema
   from pyiceberg.types import (
       NestedField,
       StringType,
       TimestampType,
   )
   
   
   def test_id_refreshed():
       schema = Schema(
           NestedField(
               field_id=1, name="datetime", field_type=TimestampType(), 
required=True
           ),
           NestedField(field_id=3, name="symbol", field_type=StringType(), 
required=True),
       )
   
       catalog = InMemoryCatalog("test")
       catalog.create_namespace("test_namespace")
   
       table = catalog.create_table(
           identifier="test_namespace.bids",
           schema=schema,
       )
   
       # IDs have been refreshed:
       assert table.schema() == Schema(
           NestedField(
               field_id=1, name="datetime", field_type=TimestampType(), 
required=True
           ),
           # The field_id should be 3
           NestedField(field_id=2, name="symbol", field_type=StringType(), 
required=True),
       )
   
   ```
   
   Is that the intention? How can one control the field_id within its schema?
   
   For context I'm trying to have the field_id match the number from my 
protobuf schema (which I use as the source of truth for my schema).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Create Iceberg Table from pyarrow Schema with no IDs [iceberg-python]

Reply via email to