Fokko commented on issue #584: URL: https://github.com/apache/iceberg-python/issues/584#issuecomment-2041585527
Oof, this is a big one. Thanks for reporting this @gwindes and thanks @kevinjqliu for jumping on this, and getting to the bottom of it. I'm also looping in @HonahX here since we want to include this fix in 0.6.1. > @Fokko Are you familiar with this behavior? I can't find any documentations on it. The original PR by Ryan (https://github.com/apache/iceberg/pull/601) suggests that this was done to be compatible with Avro, since [Avro spec does not allow special characters](https://avro.apache.org/docs/1.11.1/specification/#names). I don't think Avro is the issue, we reference the fields using the field-id. - For writing, we want to have the same behavior as Java. - For reading in PyIceberg we have an additional step in PyIceberg: When reading we read the Parquet files using the original column names, and we rename the fields [afterward in this visitor](https://github.com/apache/iceberg-python/blob/4148edb5e28ae88024a55e0b112238e65b873957/pyiceberg/io/pyarrow.py#L1137). We could correct it there, but we want to make sure that we don't write any invalid Parquet filenames in the first place. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org