[GitHub] [iceberg] moriyoshi commented on pull request #8144: Python: allow projection of Iceberg fields to pyarrow table schema with names

via GitHub Mon, 28 Aug 2023 21:29:44 -0700


moriyoshi commented on PR #8144:
URL: https://github.com/apache/iceberg/pull/8144#issuecomment-1696747907


   Sorry for leaving this off for a while, I've been quite busy. Let me trry to 
answer the questions then.
   
   > I'd like to highlight some areas where we could potentially improve.
   > 
   > 1. If we choose to filter out some nested field, pyarrow_to_schema will 
fail even with ignore_unprojectable_fields = True.
   
   This was a bug to be addressed, and I just pushed the fix.  Could you take a 
look?
   
   > 2. May be we can let pyarrow_to_schema take the complete table schema 
rather than projected schema. In this way, we can focus on dealing with fields 
that are missing in the table schema and let
   > 
https://github.com/apache/iceberg/blob/91161185ce53abbaaee992ebc1d412052e87852b/python/pyiceberg/io/pyarrow.py#L773
   >    handle the unselected columns.
   
   Actually what `pyarrow_to_schma` expects to get by `projected_schema` is a 
schema in the catalog (where I thought the word "projected" isn't so good in 
this context), and if what the "complete table schema" refers to is the catalog 
schema, its behavior should be pretty much the same as expected here.
   
   > 3.  If we have MapType whose key type is nested, pyarrow_to_schema will 
fail to switch to the correct inner schema when visiting MapType's value.
   
   This should've been fixed along with 1.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] moriyoshi commented on pull request #8144: Python: allow projection of Iceberg fields to pyarrow table schema with names

Reply via email to