Fokko commented on issue #584:
URL: https://github.com/apache/iceberg-python/issues/584#issuecomment-2041585527

   Oof, this is a big one. Thanks for reporting this @gwindes and thanks 
@kevinjqliu for jumping on this, and getting to the bottom of it. I'm also 
looping in @HonahX here since we want to include this fix in 0.6.1.
   
   > @Fokko Are you familiar with this behavior? I can't find any 
documentations on it. The original PR by Ryan 
(https://github.com/apache/iceberg/pull/601) suggests that this was done to be 
compatible with Avro, since [Avro spec does not allow special 
characters](https://avro.apache.org/docs/1.11.1/specification/#names).
   
   I don't think Avro is the issue, we reference the fields using the field-id.
   
   - For writing, we want to have the same behavior as Java.
   - For reading in PyIceberg we have an additional step in PyIceberg: When 
reading we read the Parquet files using the original column names, and we 
rename the fields [afterward in this 
visitor](https://github.com/apache/iceberg-python/blob/4148edb5e28ae88024a55e0b112238e65b873957/pyiceberg/io/pyarrow.py#L1137).
 We could correct it there, but we want to make sure that we don't write any 
invalid Parquet filenames in the first place.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to