kevinjqliu commented on issue #584: URL: https://github.com/apache/iceberg-python/issues/584#issuecomment-2041559077
> Further research shows that when I use [daft](https://www.getdaft.io/projects/docs/en/latest/user_guide/integrations/iceberg.html#reading-a-table) that I'm able to read and use the to_arrow() functionality just fine. This is interesting especially because daft utilizes pyiceberg. The column name transformation behavior is part of the Java Iceberg spec when reading/writing parquet files. Specifically, the transformed schema is pushed down to parquet reader/writer. I suspect this is happening since the Java parquet implementation supports both Avro and parquet schema (See [parquet cli](https://github.com/apache/parquet-mr/blob/db4183109d5b734ec5930d870cdae161e408ddba/parquet-cli/src/main/java/org/apache/parquet/cli/commands/SchemaCommand.java#L106-L111)). So to be compatible with both parquet and Avro schemas, this column name transformation behavior is used. From what I've seen, libraries in other languages do not do this. This means these libraries can read/write parquet files having special characters in their column names. Daft uses the Rust Arrow library which can read parquet files with special characters in their column names. Similarly, pyarrow can read it as well. I checked major parquet libraries in Python, Rust, Golang and they can all support reading special characters in parquet column names. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org