[PR] Apply Name mapping [iceberg-python]

via GitHub Sat, 16 Dec 2023 17:21:20 -0800


syun64 opened a new pull request, #219:
URL: https://github.com/apache/iceberg-python/pull/219


   Closes: https://github.com/apache/iceberg-python/issues/202
   
   Based on the following two working branches from @Fokko :
   
   1. Name-mapping plumbing: https://github.com/apache/iceberg-python/pull/212
   2. Allow missing field-ids from schema: 
https://github.com/apache/iceberg-python/pull/183
   
   This PR adds _ApplyNameMapping SchemaVisitor that traverses the pyarrow 
schema and applies the provided name_mapping.
   The preference order to pyarrow_to_schema function is:
   1. Use field_ids in file_schema
   2. Use name_mapping (if exists)
   3. Fallback to file column order if neither of above two works
   
   Above order is motivated by the current logic in [Spark Iceberg Parquet Read 
Conf](https://github.com/apache/iceberg/blob/24578a28fe69db96da460ac49eeb1a60fee7b8c7/parquet/src/main/java/org/apache/iceberg/parquet/ReadConf.java#L85)
   
   TODO:
   - [ ] Read and use table property ''schema.name-mapping.default'
   - [ ] A lot more test cases to cover edge cases, like: field_id -1
   - [ ] Get more context on identifier_field_ids: should they be ignored when 
field_ids aren't set in the file_schema?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Apply Name mapping [iceberg-python]

Reply via email to