0x26res commented on issue #278: URL: https://github.com/apache/iceberg-python/issues/278#issuecomment-2790415821
Sorry I'm not sure if this is the right place to ask this question. My understanding from this conversation is that when a user provides a `pa.Schema` to create an iceberg table, field_ids gets "refreshed" (aka, we assign monotonically increasing field_id). But it seems to me that actually, field_id are refreshed no matter what one does, and in particular when the use provide explicit values for field_ids: ```pyhon from pyiceberg.catalog.memory import InMemoryCatalog from pyiceberg.schema import Schema from pyiceberg.types import ( NestedField, StringType, TimestampType, ) def test_id_refreshed(): schema = Schema( NestedField( field_id=1, name="datetime", field_type=TimestampType(), required=True ), NestedField(field_id=3, name="symbol", field_type=StringType(), required=True), ) catalog = InMemoryCatalog("test") catalog.create_namespace("test_namespace") table = catalog.create_table( identifier="test_namespace.bids", schema=schema, ) # IDs have been refreshed: assert table.schema() == Schema( NestedField( field_id=1, name="datetime", field_type=TimestampType(), required=True ), # The field_id should be 3 NestedField(field_id=2, name="symbol", field_type=StringType(), required=True), ) ``` Is that the intention? How can one control the field_id within its schema? For context I'm trying to have the field_id match the number from my protobuf schema (which I use as the source of truth for my schema). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org