syun64 commented on issue #716:
URL: https://github.com/apache/iceberg-python/issues/716#issuecomment-2170966077

   Hi @cgbur and @felixscherz thank you for raising this and taking this 
investigation further. I'm not a polars user myself, but the difference in the 
behavior is quite interesting, and I think there would be value in trying to 
fix this issue.
   
   I just read the `write_table` API documentation in `pyarrow.parquet` and 
found something rather interesting:
   
   
https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html#pyarrow.parquet.write_table
   
   If you check the documentation for `use_compliant_nested_type` flag, it 
mentions that having `element` as the single item field name is the parquet 
compliant format as specified here on the [Parquet Spec for Nested 
Types](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#nested-types).
 PyArrow defaults to using this flag and writes the list element field name as 
`element`.
   
   For some reason it looks like polars has decided to use `item` - the 
non-parquet compliant list element name instead. While I'm curious about why 
the polars community has decided to go this route, I also think supporting both 
`element` or `item` name in the visitor may not be the worst thing, just to 
increase our scope of support.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to