syun64 commented on issue #716: URL: https://github.com/apache/iceberg-python/issues/716#issuecomment-2170966077
Hi @cgbur and @felixscherz thank you for raising this and taking this investigation further. I'm not a polars user myself, but the difference in the behavior is quite interesting, and I think there would be value in trying to fix this issue. I just read the `write_table` API documentation in `pyarrow.parquet` and found something rather interesting: https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html#pyarrow.parquet.write_table If you check the documentation for `use_compliant_nested_type` flag, it mentions that having `element` as the single item field name is the parquet compliant format as specified here on the [Parquet Spec for Nested Types](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#nested-types). PyArrow defaults to using this flag and writes the list element field name as `element`. For some reason it looks like polars has decided to use `item` - the non-parquet compliant list element name instead. While I'm curious about why the polars community has decided to go this route, I also think supporting both `element` or `item` name in the visitor may not be the worst thing, just to increase our scope of support. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org