sungwy commented on code in PR #1026: URL: https://github.com/apache/iceberg-python/pull/1026#discussion_r1710310025
########## pyiceberg/io/pyarrow.py: ########## @@ -1249,11 +1251,12 @@ def _task_to_record_batches( # https://github.com/apache/arrow/issues/39220 arrow_table = pa.Table.from_batches([batch]) arrow_table = arrow_table.filter(pyarrow_filter) + if len(arrow_table) == 0: + continue batch = arrow_table.to_batches()[0] yield _to_requested_schema( projected_schema, file_project_schema, batch, downcast_ns_timestamp_to_us=True, use_large_types=use_large_types ) - current_index += len(batch) Review Comment: When working on fixing https://github.com/apache/iceberg-python/issues/1024 I realized a correctness issue was introduced here because we are using the length of the filtered batch instead of the original one when tracking the `current_index`. I think it'll be crucial to get this fix in with 0.7.1 as soon as possible to support our MOR users -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org