Re: [PR] Apply Name mapping [iceberg-python]

via GitHub Mon, 18 Dec 2023 19:38:10 -0800


syun64 commented on PR #219:
URL: https://github.com/apache/iceberg-python/pull/219#issuecomment-1862055260


   Hi folks, I've rebased the branch from main to reflect the changes in [Add 
name-mapping](https://github.com/apache/iceberg-python/pull/212), and added 
some negative test cases.
   
   I have some questions:
   1. The current proposed approach assumes that if any of the fields present 
in the pyarrow schema does not have a field ID, it has missing field IDs, and 
hence would require the table name mapping property to be set. Conversely, the 
version of ConvertToIceberg visitor on main [simply 
ignores](https://github.com/apache/iceberg-python/blob/main/pyiceberg/io/pyarrow.py#L744)
 any fields that do not have field_ids in its metadata, and hence is able to 
read files with partially missing field_ids. Does the proposed behavior sound 
correct?
   2. In order to achieve above, I am creating NestedFields with field_id=-1 
(since there is an int_type constraint on NestedField. Although this feels 
arbitrary, we could argue that this is reasonable since '-1' is also Parquet's 
choice of arbitrary int when representing missing field_ids. 
[Example](https://arrow.apache.org/docs/python/parquet.html#finer-grained-reading-and-writing):
   ```
   <pyarrow._parquet.ParquetSchema object at 0x7f72b317fec0>
   required group field_id=-1 schema {
     optional double field_id=-1 one;
     optional binary field_id=-1 two (String);
     optional boolean field_id=-1 three;
     optional binary field_id=-1 __index_level_0__ (String);
   }
   ```
   Does this sound reasonable?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Apply Name mapping [iceberg-python]

Reply via email to