liamphmurphy opened a new issue, #43893:
URL: https://github.com/apache/arrow/issues/43893
### Describe the bug, including details regarding any error messages,
version, and platform.
Following a schema merge operation involving nested columns, PyArrow seems
to struggle with loading data with the following error:
`pyarrow.lib.ArrowTypeError: struct fields don't match or are in the wrong
order: Input fields: struct<c: int64> output fields: struct<c: int64, d: int64>`
I have confirmed this does not happen with a schema merge that DOES NOT
involve any nested columns.
I believe this is a PyArrow specific problem as Spark does not have this
problem.
Below is an example of how this can be reproduced:
```
import pyarrow as pa
import polars as pl
from deltalake import write_deltalake
# Create a pyarrow table, include a nested column 'd'
df = pa.table({
"a": [1, 2, 3],
"b": [{"c": 1}, {"c": 2}, {"c": 3}]
})
# Create a PyArrow schema, include a nested column 'd'
schema = pa.schema([
pa.field("a", pa.int64()),
pa.field("b", pa.struct([
pa.field("c", pa.int64())
]))
])
local_path = "./tables/merge_delta_table"
# Write the table to delta lake
write_deltalake(local_path, data=df, engine="rust", schema=schema,
mode="append")
# Create a new table with a different schema, adding
df2 = pa.table({
"a": [4, 5, 6],
"b": [{"d": 2, "c": 1}, {"c": 2}, {"c": 3}]
})
schema2 = pa.schema([
pa.field("a", pa.int64()),
pa.field("b", pa.struct([
pa.field("d", pa.int64()),
pa.field("c", pa.int64())
]))
])
# Write the new table to the same delta lake
write_deltalake(local_path, data=df2, schema=schema2, engine="rust",
mode="append", schema_mode="merge")
# Now read the delta lake using polars
df = pl.read_delta(local_path)
print(df)
```
### Component(s)
Parquet, Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]