kevinjqliu commented on issue #1798:
URL:
https://github.com/apache/iceberg-python/issues/1798#issuecomment-2764704243
I suspect the issue is with the schema definition
```
schema = Schema(
NestedField(field_id=1, name="name", field_type=StringType(),
required=False),
NestedField(
field_id=3,
name="my_list",
field_type=ListType(
element_id=45, element=StringType(), element_required=False
),
required=False,
),
)
```
or how we handle the schema conversion internally, between iceberg schema
and pyarrow schema.
For example, using the example iceberg schema provided, i get a schema
mismatch
```
# not working
from pyiceberg.catalog import load_catalog
import pyarrow as pa
from pyiceberg.schema import Schema
from pyiceberg.types import NestedField, StringType, ListType
from pyiceberg.io.pyarrow import schema_to_pyarrow
catalog = load_catalog(**dict(type="in-memory"))
schema = Schema(
NestedField(field_id=1, name="name", field_type=StringType(),
required=False),
NestedField(
field_id=3,
name="my_list",
field_type=ListType(
element_id=45, element=StringType(), element_required=False
),
required=False,
),
)
pyarrow_schema = schema_to_pyarrow(schema)
# create table
catalog.create_namespace_if_not_exists("test")
catalog.create_table_if_not_exists("test.table", pyarrow_schema)
# append data
df_1 = pa.Table.from_pylist([
{"name": "one", "my_list": ["test"]},
{"name": "another", "my_list": ["test"]},
], schema=pyarrow_schema)
catalog.load_table("test.table").append(df_1)
catalog.load_table("test.table").scan().to_arrow()
# append more data
df_2 = pa.Table.from_pylist([
{"name": "one"},
{"name": "another"},
], schema=pyarrow_schema)
catalog.load_table("test.table").append(df_2)
catalog.load_table("test.table").scan().to_arrow()
```
```
ValueError: Mismatch in fields:
┏━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ┃ Table field ┃ Dataframe field
┃
┡━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ ✅ │ 1: name: optional string │ 1: name: optional string │
│ ✅ │ 2: my_list: optional list<string> │ Missing │
│ ❌ │ 3: element: optional string │ 3: my_list: optional list<string> │
└────┴───────────────────────────────────┴───────────────────────────────────┘
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]