Fokko commented on code in PR #1026:
URL: https://github.com/apache/iceberg-python/pull/1026#discussion_r1711043760


##########
pyiceberg/io/pyarrow.py:
##########
@@ -1249,11 +1251,12 @@ def _task_to_record_batches(
                     # https://github.com/apache/arrow/issues/39220
                     arrow_table = pa.Table.from_batches([batch])
                     arrow_table = arrow_table.filter(pyarrow_filter)
+                    if len(arrow_table) == 0:
+                        continue
                     batch = arrow_table.to_batches()[0]
             yield _to_requested_schema(
                 projected_schema, file_project_schema, batch, 
downcast_ns_timestamp_to_us=True, use_large_types=use_large_types
             )
-            current_index += len(batch)

Review Comment:
   Oof, that's a good find. Thanks @vhnguyenae for reporting this!
   
   The order of applying filters also caught me when implementing positional 
deletes. In the long run, I think it would be good to push this down to Arrow, 
I created an issue a while ago: https://github.com/apache/arrow/issues/35301 
But that hasn't seen much traction.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to