kevinjqliu commented on issue #584:
URL: https://github.com/apache/iceberg-python/issues/584#issuecomment-2041546390

   This was a super interesting deep dive. 
   
   So Iceberg has an obscure behavior of transforming column names with special 
characters. As you see above, `TEST:A1B2.RAW.ABC-GG-1-A` is transformed into 
`TEST_x3AA1B2_x2ERAW_x2EABC_x2DGG_x2D1_x2DA`. This is mentioned in #83 and 
refers to the 
[AvroSchemaUtil::makeCompatibleName](https://github.com/apache/iceberg/blob/ad602a379584512d1d96eda557c20cf2af21d1b2/core/src/main/java/org/apache/iceberg/avro/AvroSchemaUtil.java#L429)
 function.
   
   ### Java Iceberg Behavior
   When there is a special character in the column name, Iceberg will transform 
the column name first before writing to parquet. The resulting parquet file 
will have the transformed column name while Iceberg retains the original column 
name in the metadata. 
   When writing, Iceberg will write parquet files with the transformed column 
name. When reading, Iceberg will perform the transformation to read the 
transformed column name. This is done by matching the column id. 
   
   ### Python Iceberg Behavior
   The issue in PyIceberg here is not the read side, it's the write side! When 
an Iceberg table's column name has special characters, the parquet files should 
contain the transformed column name. Instead, PyIceberg writes the column name 
with the special characters. 
   
   That is the issue above, there is a mismatch between the expected column 
name (transformed, `TEST_x3AA1B2_x2ERAW_x2EABC_x2DGG_x2D1_x2DA`) and the actual 
column name (untransformed, `TEST:A1B2.RAW.ABC-GG-1-A`).
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to