syun64 opened a new pull request, #219: URL: https://github.com/apache/iceberg-python/pull/219
Closes: https://github.com/apache/iceberg-python/issues/202 Based on the following two working branches from @Fokko : 1. Name-mapping plumbing: https://github.com/apache/iceberg-python/pull/212 2. Allow missing field-ids from schema: https://github.com/apache/iceberg-python/pull/183 This PR adds _ApplyNameMapping SchemaVisitor that traverses the pyarrow schema and applies the provided name_mapping. The preference order to pyarrow_to_schema function is: 1. Use field_ids in file_schema 2. Use name_mapping (if exists) 3. Fallback to file column order if neither of above two works Above order is motivated by the current logic in [Spark Iceberg Parquet Read Conf](https://github.com/apache/iceberg/blob/24578a28fe69db96da460ac49eeb1a60fee7b8c7/parquet/src/main/java/org/apache/iceberg/parquet/ReadConf.java#L85) TODO: - [ ] Read and use table property ''schema.name-mapping.default' - [ ] A lot more test cases to cover edge cases, like: field_id -1 - [ ] Get more context on identifier_field_ids: should they be ignored when field_ids aren't set in the file_schema? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org