kevinjqliu commented on issue #1798:
URL: 
https://github.com/apache/iceberg-python/issues/1798#issuecomment-2764704243

   I suspect the issue is with the schema definition 
   ```
   schema = Schema(
       NestedField(field_id=1, name="name", field_type=StringType(), 
required=False),
       NestedField(
           field_id=3,
           name="my_list",
           field_type=ListType(
               element_id=45, element=StringType(), element_required=False
           ),
           required=False,
       ),
   )
   ```
   or how we handle the schema conversion internally, between iceberg schema 
and pyarrow schema. 
   
   For example, using the example iceberg schema provided, i get a schema 
mismatch
   ```
   
   # not working
   from pyiceberg.catalog import load_catalog
   import pyarrow as pa
   from pyiceberg.schema import Schema
   from pyiceberg.types import NestedField, StringType, ListType
   from pyiceberg.io.pyarrow import schema_to_pyarrow
   
   catalog = load_catalog(**dict(type="in-memory"))
   
   schema = Schema(
       NestedField(field_id=1, name="name", field_type=StringType(), 
required=False),
       NestedField(
           field_id=3,
           name="my_list",
           field_type=ListType(
               element_id=45, element=StringType(), element_required=False
           ),
           required=False,
       ),
   )
   pyarrow_schema = schema_to_pyarrow(schema)
   
   # create table
   catalog.create_namespace_if_not_exists("test")
   catalog.create_table_if_not_exists("test.table", pyarrow_schema)
   
   # append data
   df_1 = pa.Table.from_pylist([
       {"name": "one", "my_list": ["test"]},
       {"name": "another", "my_list": ["test"]},
   ], schema=pyarrow_schema)
   catalog.load_table("test.table").append(df_1)
   catalog.load_table("test.table").scan().to_arrow()
   
   # append more data
   df_2 = pa.Table.from_pylist([
       {"name": "one"},
       {"name": "another"},
   ], schema=pyarrow_schema)
   catalog.load_table("test.table").append(df_2)
   catalog.load_table("test.table").scan().to_arrow()
   
   ```
   
   ```
   ValueError: Mismatch in fields:
   
┏━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
   ┃    ┃ Table field                       ┃ Dataframe field                   
┃
   
┡━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
   │ ✅ │ 1: name: optional string          │ 1: name: optional string          │
   │ ✅ │ 2: my_list: optional list<string> │ Missing                           │
   │ ❌ │ 3: element: optional string       │ 3: my_list: optional list<string> │
   
└────┴───────────────────────────────────┴───────────────────────────────────┘
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to