Fokko commented on code in PR #1026: URL: https://github.com/apache/iceberg-python/pull/1026#discussion_r1711043760
########## pyiceberg/io/pyarrow.py: ########## @@ -1249,11 +1251,12 @@ def _task_to_record_batches( # https://github.com/apache/arrow/issues/39220 arrow_table = pa.Table.from_batches([batch]) arrow_table = arrow_table.filter(pyarrow_filter) + if len(arrow_table) == 0: + continue batch = arrow_table.to_batches()[0] yield _to_requested_schema( projected_schema, file_project_schema, batch, downcast_ns_timestamp_to_us=True, use_large_types=use_large_types ) - current_index += len(batch) Review Comment: Oof, that's a good find. Thanks @vhnguyenae for reporting this! The order of applying filters also caught me when implementing positional deletes. In the long run, I think it would be good to push this down to Arrow, I created an issue a while ago: https://github.com/apache/arrow/issues/35301 But that hasn't seen much traction. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org