Re: [I] [BUG] Valid column characters fail on to_arrow() or to_pandas() ArrowInvalid: No match for FieldRef.Name [iceberg-python]

via GitHub Sun, 07 Apr 2024 13:03:48 -0700


Fokko commented on issue #584:
URL: https://github.com/apache/iceberg-python/issues/584#issuecomment-2041585527

Oof, this is a big one. Thanks for reporting this @gwindes and thanks
@kevinjqliu for jumping on this, and getting to the bottom of it. I'm also
looping in @HonahX here since we want to include this fix in 0.6.1.

> @Fokko Are you familiar with this behavior? I can't find any
documentations on it. The original PR by Ryan
(https://github.com/apache/iceberg/pull/601) suggests that this was done to be
compatible with Avro, since [Avro spec does not allow special
characters](https://avro.apache.org/docs/1.11.1/specification/#names).

I don't think Avro is the issue, we reference the fields using the field-id.

- For writing, we want to have the same behavior as Java.
- For reading in PyIceberg we have an additional step in PyIceberg: When
reading we read the Parquet files using the original column names, and we
rename the fields [afterward in this
visitor](https://github.com/apache/iceberg-python/blob/4148edb5e28ae88024a55e0b112238e65b873957/pyiceberg/io/pyarrow.py#L1137).
We could correct it there, but we want to make sure that we don't write any
invalid Parquet filenames in the first place.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] [BUG] Valid column characters fail on to_arrow() or to_pandas() ArrowInvalid: No match for FieldRef.Name [iceberg-python]

Reply via email to