Re: [PR] Use batchreader in upsert [iceberg-python]

via GitHub Mon, 02 Jun 2025 22:34:02 -0700


koenvo commented on code in PR #1995:
URL: https://github.com/apache/iceberg-python/pull/1995#discussion_r2122696555



##########
pyiceberg/io/pyarrow.py:
##########
@@ -1643,8 +1646,20 @@ def to_record_batches(self, tasks: 
Iterable[FileScanTask]) -> Iterator[pa.Record
             ResolveError: When a required field cannot be found in the file
             ValueError: When a field type in the file cannot be projected to 
the schema type
         """
+        from concurrent.futures import ThreadPoolExecutor
+
         deletes_per_file = _read_all_delete_files(self._io, tasks)
-        return self._record_batches_from_scan_tasks_and_deletes(tasks, 
deletes_per_file)
+
+        if concurrent_tasks is not None:
+            with ThreadPoolExecutor(max_workers=concurrent_tasks) as pool:

Review Comment:
   Ah, already had this changed but forgot to push. Only need to make sure I 
get a pool with the correct max_workers set. Can't just use the regular 
`get_or_create` as that might have an incorrect number of workers.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Use batchreader in upsert [iceberg-python]

Reply via email to