Re: [I] API table.scan does not conform to Iceberg spec for identity partition columns [iceberg-python]

via GitHub Fri, 13 Dec 2024 14:30:20 -0800


gabeiglio commented on issue #1401:
URL: 
https://github.com/apache/iceberg-python/issues/1401#issuecomment-2542474285


   Im open for feedback but as I investigated this issue im inclined that the 
fix would need to be in 
[_task_to_record_batches](https://github.com/apache/iceberg-python/blob/a97d13c17cd03f86252b9df2c65532ec45fb05da/pyiceberg/io/pyarrow.py#L1219C1-L1219C4).
 
   
   By comparing the projected schema vs the file projection schema we could:
   
   1. Check the missing id from file to the partitionSpec and check if 
isinstance of IdentityTransform
   2. Check if the data file partition struct contains that partition field 
(check by name)
   3. Try to inject this new column in the resultant RecordBatch 
   
   Im still figuring out how to do step three (and if its possible), apologies 
for the speed but Ive been a bit short on time. @kevinjqliu does it make sense?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] API table.scan does not conform to Iceberg spec for identity partition columns [iceberg-python]

Reply via email to