Re: [PR] Support `Table.to_arrow_batch_reader` to return RecordBatchReader instead of a fully materialized Arrow Table [iceberg-python]

via GitHub Wed, 12 Jun 2024 19:49:34 -0700


syun64 commented on code in PR #786:
URL: https://github.com/apache/iceberg-python/pull/786#discussion_r1637423967



##########
pyiceberg/io/pyarrow.py:
##########
@@ -1005,36 +1004,46 @@ def _task_to_table(
             columns=[col.name for col in file_project_schema.columns],
         )
 
-        if positional_deletes:
-            # Create the mask of indices that we're interested in
-            indices = _combine_positional_deletes(positional_deletes, 
fragment.count_rows())
-
-            if limit:
-                if pyarrow_filter is not None:
-                    # In case of the filter, we don't exactly know how many 
rows
-                    # we need to fetch upfront, can be optimized in the future:
-                    # https://github.com/apache/arrow/issues/35301
-                    arrow_table = fragment_scanner.take(indices)
-                    arrow_table = arrow_table.filter(pyarrow_filter)
-                    arrow_table = arrow_table.slice(0, limit)
-                else:
-                    arrow_table = fragment_scanner.take(indices[0:limit])
-            else:
-                arrow_table = fragment_scanner.take(indices)
+        current_index = 0
+        batches = fragment_scanner.to_batches()

Review Comment:
   That's a great suggestion @corleyma I'll adopt this feedback when I make the 
next round of changes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Support `Table.to_arrow_batch_reader` to return RecordBatchReader instead of a fully materialized Arrow Table [iceberg-python]

Reply via email to