moriyoshi commented on PR #8144: URL: https://github.com/apache/iceberg/pull/8144#issuecomment-1696747907
Sorry for leaving this off for a while, I've been quite busy. Let me trry to answer the questions then. > I'd like to highlight some areas where we could potentially improve. > > 1. If we choose to filter out some nested field, pyarrow_to_schema will fail even with ignore_unprojectable_fields = True. This was a bug to be addressed, and I just pushed the fix. Could you take a look? > 2. May be we can let pyarrow_to_schema take the complete table schema rather than projected schema. In this way, we can focus on dealing with fields that are missing in the table schema and let > https://github.com/apache/iceberg/blob/91161185ce53abbaaee992ebc1d412052e87852b/python/pyiceberg/io/pyarrow.py#L773 > handle the unselected columns. Actually what `pyarrow_to_schma` expects to get by `projected_schema` is a schema in the catalog (where I thought the word "projected" isn't so good in this context), and if what the "complete table schema" refers to is the catalog schema, its behavior should be pretty much the same as expected here. > 3. If we have MapType whose key type is nested, pyarrow_to_schema will fail to switch to the correct inner schema when visiting MapType's value. This should've been fixed along with 1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
