Re: [PR] do not materialize entire files in `to_record_batches` [iceberg-python]

via GitHub Mon, 03 Nov 2025 08:56:57 -0800


kevinjqliu commented on code in PR #2676:
URL: https://github.com/apache/iceberg-python/pull/2676#discussion_r2487206100



##########
pyiceberg/io/pyarrow.py:
##########
@@ -1726,16 +1726,10 @@ def to_record_batches(self, tasks: 
Iterable[FileScanTask]) -> Iterator[pa.Record
         deletes_per_file = _read_all_delete_files(self._io, tasks)
 
         total_row_count = 0
-        executor = ExecutorFactory.get_or_create()
-
-        def batches_for_task(task: FileScanTask) -> List[pa.RecordBatch]:
-            # Materialize the iterator here to ensure execution happens within 
the executor.
-            # Otherwise, the iterator would be lazily consumed later (in the 
main thread),
-            # defeating the purpose of using executor.map.
-            return 
list(self._record_batches_from_scan_tasks_and_deletes([task], deletes_per_file))

Review Comment:
   this logic was added in #1995 to parallelize the work using the executor. 
   
   > My expectation for to_record_batches was that it would yield batches and 
not materialize an entire parquet file in memory, but it looks like the current 
implementation explicitly does this.
   
   that would be my expectation as well. The current implementation reads an 
entire parquet file. 
   
   BUT I do want to still keep the ability to parallelize reading... 
   lets see if there's a better way to refactor this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] do not materialize entire files in `to_record_batches` [iceberg-python]

Reply via email to