corleyma commented on code in PR #1995: URL: https://github.com/apache/iceberg-python/pull/1995#discussion_r2122591607
########## pyiceberg/io/pyarrow.py: ########## @@ -1643,8 +1646,20 @@ def to_record_batches(self, tasks: Iterable[FileScanTask]) -> Iterator[pa.Record ResolveError: When a required field cannot be found in the file ValueError: When a field type in the file cannot be projected to the schema type """ + from concurrent.futures import ThreadPoolExecutor + deletes_per_file = _read_all_delete_files(self._io, tasks) - return self._record_batches_from_scan_tasks_and_deletes(tasks, deletes_per_file) + + if concurrent_tasks is not None: + with ThreadPoolExecutor(max_workers=concurrent_tasks) as pool: Review Comment: Rather than create your own threadpool executor here, I think you should use the ExecutorFactory defined elsewhere in the repo. It has a get_or_create method that prevents creating a new threadpool on every call, among other things. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org