kevinjqliu commented on issue #1798: URL: https://github.com/apache/iceberg-python/issues/1798#issuecomment-2764711648
there's a bug somewhere in the schema translation between pyarrow schema and iceberg schema. Note the iceberg table schema, has an extra `field_id=2` Output: ``` >>> schema Schema(NestedField(field_id=1, name='name', field_type=StringType(), required=False), NestedField(field_id=3, name='my_list', field_type=ListType(type='list', element_id=45, element_type=StringType(), element_required=False), required=False), schema_id=0, identifier_field_ids=[]) >>> pyarrow_schema name: large_string -- field metadata -- PARQUET:field_id: '1' my_list: large_list<element: large_string> child 0, element: large_string -- field metadata -- PARQUET:field_id: '45' -- field metadata -- PARQUET:field_id: '3' >>> catalog.load_table("test.table").schema() Schema(NestedField(field_id=1, name='name', field_type=StringType(), required=False), NestedField(field_id=2, name='my_list', field_type=ListType(type='list', element_id=3, element_type=StringType(), element_required=False), required=False), schema_id=0, identifier_field_ids=[]) >>> from pyiceberg.io.pyarrow import pyarrow_to_schema >>> pyarrow_to_schema(pyarrow_schema, name_mapping=schema.name_mapping) Schema(NestedField(field_id=1, name='name', field_type=StringType(), required=False), NestedField(field_id=3, name='my_list', field_type=ListType(type='list', element_id=45, element_type=StringType(), element_required=False), required=False), schema_id=0, identifier_field_ids=[]) ``` Reproduce: ``` # schema difference from pyiceberg.catalog import load_catalog import pyarrow as pa from pyiceberg.schema import Schema from pyiceberg.types import NestedField, StringType, ListType from pyiceberg.io.pyarrow import schema_to_pyarrow catalog = load_catalog(**dict(type="in-memory")) schema = Schema( NestedField(field_id=1, name="name", field_type=StringType(), required=False), NestedField( field_id=3, name="my_list", field_type=ListType( element_id=45, element=StringType(), element_required=False ), required=False, ), ) pyarrow_schema = schema_to_pyarrow(schema) # create table catalog.create_namespace_if_not_exists("test") catalog.create_table_if_not_exists("test.table", pyarrow_schema) # iceberg schema catalog.load_table("test.table").schema() # pyarrow to iceberg schema from pyiceberg.io.pyarrow import pyarrow_to_schema pyarrow_to_schema(pyarrow_schema, name_mapping=schema.name_mapping) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org