gabeiglio commented on issue #1401: URL: https://github.com/apache/iceberg-python/issues/1401#issuecomment-2542474285
Im open for feedback but as I investigated this issue im inclined that the fix would need to be in [_task_to_record_batches](https://github.com/apache/iceberg-python/blob/a97d13c17cd03f86252b9df2c65532ec45fb05da/pyiceberg/io/pyarrow.py#L1219C1-L1219C4). By comparing the projected schema vs the file projection schema we could: 1. Check the missing id from file to the partitionSpec and check if isinstance of IdentityTransform 2. Check if the data file partition struct contains that partition field (check by name) 3. Try to inject this new column in the resultant RecordBatch Im still figuring out how to do step three (and if its possible), apologies for the speed but Ive been a bit short on time. @kevinjqliu does it make sense? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org