Re: [PR] Use batchreader in upsert [iceberg-python]

via GitHub Mon, 02 Jun 2025 20:54:27 -0700


corleyma commented on code in PR #1995:
URL: https://github.com/apache/iceberg-python/pull/1995#discussion_r2122591607



##########
pyiceberg/io/pyarrow.py:
##########
@@ -1643,8 +1646,20 @@ def to_record_batches(self, tasks: 
Iterable[FileScanTask]) -> Iterator[pa.Record
             ResolveError: When a required field cannot be found in the file
             ValueError: When a field type in the file cannot be projected to 
the schema type
         """
+        from concurrent.futures import ThreadPoolExecutor
+
         deletes_per_file = _read_all_delete_files(self._io, tasks)
-        return self._record_batches_from_scan_tasks_and_deletes(tasks, 
deletes_per_file)
+
+        if concurrent_tasks is not None:
+            with ThreadPoolExecutor(max_workers=concurrent_tasks) as pool:

Review Comment:
   Rather than create your own threadpool executor here, I think you should use 
the ExecutorFactory defined elsewhere in the repo.  It has a get_or_create 
method that prevents creating a new threadpool on every call, among other 
things.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Use batchreader in upsert [iceberg-python]

Reply via email to