Fokko commented on issue #1335: URL: https://github.com/apache/iceberg-python/issues/1335#issuecomment-2489383016
Yes, looks like this shouldn't be too hard. I think it would be good to [re-use the `ExecutorFactory`](https://github.com/apache/iceberg-python/blob/main/pyiceberg/utils/concurrent.py): I would refactor `parquet_files_to_data_files` to let it take a single file instead of an `Iterator`, and then call it `parquet_file_to_data_file`. ```python def _parquet_files_to_data_files(table_metadata: TableMetadata, file_paths: List[str], io: FileIO) -> Iterable[DataFile]: """Convert a list files into DataFiles. Returns: An iterable that supplies DataFiles that describe the parquet files. """ from pyiceberg.io.pyarrow import parquet_files_to_data_files executor = ExecutorFactory.get_or_create() futures = [ executor.submit( parquet_file_to_data_file, io, table_metadata, file_path ) for file_path in file_paths ] return [f.result() for f in futures if f.result()] ``` @kevinjqliu I would not classify this as a bugfix, so I'm not sure if this is appropriate for 0.8.1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org