cgbur commented on issue #716: URL: https://github.com/apache/iceberg-python/issues/716#issuecomment-2101251170
Ah, confusingly there appears to be writer differences that cause the issue. My Rust pyarrow implementation matches when polars has `pyarrow=True`. ```python import polars as pl import pyarrow.parquet as pq df = pl.DataFrame( { "a": [[{"a": 1}, {"a": 2}], [{"a": 3}]], } ) def print_schema_path(path, col_name): metadata = pq.read_metadata(path) for group_number in range(metadata.num_row_groups): row_group = metadata.row_group(group_number) for column_number in range(row_group.num_columns): column = row_group.column(column_number) if column.path_in_schema.startswith(col_name): print(f"path_in_schema: {column.path_in_schema}") df.write_parquet("example.parquet", use_pyarrow=False) print("with polars") print(pq.read_schema("example.parquet")) print_schema_path("example.parquet", "a") df.write_parquet("example.parquet", use_pyarrow=True) print("with pyarrow") print(pq.read_schema("example.parquet")) print_schema_path("example.parquet", "a") ``` ``` with polars a: large_list<item: struct<a: int64>> child 0, item: struct<a: int64> child 0, a: int64 path_in_schema: a.list.item.a with pyarrow a: large_list<element: struct<a: int64>> child 0, element: struct<a: int64> child 0, a: int64 path_in_schema: a.list.element.a ``` Perhaps the visitor is not respecting the name used in the schema? Or there is a mismatch in the method used to acquire between the iceberg and parquet change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org