Fokko commented on issue #1335:
URL: 
https://github.com/apache/iceberg-python/issues/1335#issuecomment-2489383016

   Yes, looks like this shouldn't be too hard. I think it would be good to 
[re-use the 
`ExecutorFactory`](https://github.com/apache/iceberg-python/blob/main/pyiceberg/utils/concurrent.py):
   
   I would refactor `parquet_files_to_data_files` to let it take a single file 
instead of an `Iterator`, and then call it `parquet_file_to_data_file`.
   
   ```python
   def _parquet_files_to_data_files(table_metadata: TableMetadata, file_paths: 
List[str], io: FileIO) -> Iterable[DataFile]:
       """Convert a list files into DataFiles.
   
       Returns:
           An iterable that supplies DataFiles that describe the parquet files.
       """
       from pyiceberg.io.pyarrow import parquet_files_to_data_files
   
       executor = ExecutorFactory.get_or_create()
       futures = [
           executor.submit(
               parquet_file_to_data_file,
               io,
               table_metadata,
               file_path
           )
           for file_path in file_paths
       ]
   
       return [f.result() for f in futures if f.result()]
   ```
   
   @kevinjqliu I would not classify this as a bugfix, so I'm not sure if this 
is appropriate for 0.8.1.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to