kevinjqliu commented on issue #584: URL: https://github.com/apache/iceberg-python/issues/584#issuecomment-2044152708
> I would argue that the Python one is correct Yeah me too. But I think Java Iceberg doesn't support this since parquet files with `ABC-GG-1-A` column will be read as Iceberg column `ABC_x2DGG_x2D1_x2DA`. I think it's worth opening an issue to track this in the main Iceberg repo. PyIceberg can already read a parquet file with a "sanitized" column name, this is handled by #83. [file_project_schema](https://github.com/apache/iceberg-python/pull/83/files#diff-8d5e63f2a87ead8cebe2fd8ac5dcf2198d229f01e16bb9e06e21f7277c328abdR834), which is the schema with "sanitized" column name, is used to read the parquet files. So we just need to ensure that PyIceberg writes parquet files with the same behavior as Java Iceberg. #590 verifies that fixing the write behavior will fix the entire roundtrip described in the issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org