[I] Return dataset or list of created files from pyarrow.parquet.write_to_dataset() [arrow]

via GitHub Fri, 26 Apr 2024 12:35:04 -0700


alippai opened a new issue, #41399:
URL: https://github.com/apache/arrow/issues/41399


   ### Describe the enhancement requested
   
   Currently we have to write the dataset and read it back to get the dataset 
object or the list of files created. 
   This is undesired when appending (writing new files) to an existing dataset 
or when the file operations are relatively expensive (remote filesystems). 
   
   The docs bring the example:
   ```python
   import pyarrow.parquet as pq
   pq.write_to_dataset(table, root_path='dataset_name_3',
                       partition_cols=['year'])
   pq.ParquetDataset('dataset_name_3').files
   ```
   instead of the above we could simply:
   ```python
   import pyarrow.parquet as pq
   pq.write_to_dataset(table, root_path='dataset_name_3',
                       partition_cols=['year']).files
   ```
   
   ### Component(s)
   
   C++, Parquet, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] Return dataset or list of created files from pyarrow.parquet.write_to_dataset() [arrow]

Reply via email to