kevinjqliu commented on issue #584: URL: https://github.com/apache/iceberg-python/issues/584#issuecomment-2041546390
This was a super interesting deep dive. So Iceberg has an obscure behavior of transforming column names with special characters. As you see above, `TEST:A1B2.RAW.ABC-GG-1-A` is transformed into `TEST_x3AA1B2_x2ERAW_x2EABC_x2DGG_x2D1_x2DA`. This is mentioned in #83 and refers to the [AvroSchemaUtil::makeCompatibleName](https://github.com/apache/iceberg/blob/ad602a379584512d1d96eda557c20cf2af21d1b2/core/src/main/java/org/apache/iceberg/avro/AvroSchemaUtil.java#L429) function. ### Java Iceberg Behavior When there is a special character in the column name, Iceberg will transform the column name first before writing to parquet. The resulting parquet file will have the transformed column name while Iceberg retains the original column name in the metadata. When writing, Iceberg will write parquet files with the transformed column name. When reading, Iceberg will perform the transformation to read the transformed column name. This is done by matching the column id. ### Python Iceberg Behavior The issue in PyIceberg here is not the read side, it's the write side! When an Iceberg table's column name has special characters, the parquet files should contain the transformed column name. Instead, PyIceberg writes the column name with the special characters. That is the issue above, there is a mismatch between the expected column name (transformed, `TEST_x3AA1B2_x2ERAW_x2EABC_x2DGG_x2D1_x2DA`) and the actual column name (untransformed, `TEST:A1B2.RAW.ABC-GG-1-A`). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org