[I] Python / Cython interface to C++ arrow::dataset::Partitioning::Format [arrow]

via GitHub Wed, 14 Aug 2024 01:44:23 -0700


Feiyang472 opened a new issue, #43684:
URL: https://github.com/apache/arrow/issues/43684


   ### Describe the enhancement requested
   
   Hi Arrow team
   We use pyarrow for dataset partitioning. We want to find the relative paths 
on the filesystem for respective partitioning schemes and segment encodings. 
   
   For example, if using hive partitioning, given a filter `("key", "=", "value 
value")`, we would like `/key=value value/`
   another example, if using hive partitioning, and uri segment encoding, given 
a filter `("key", "=", "value value")`, we would like `/key=value%20value/`
   another example, if using directorypartitioning, given a filter `("key", 
"=", "value value")`, we would like `/value value/`. We are currently composing 
these paths by hand, but we would like to be resilient to changes/inheritances 
in arrow implementation.
   
   To achieve this, we would really appreciate if the C++ API
   ```
   arrow::dataset::Partitioning:Format
   ```
   could be exposed via cython
   
https://github.com/apache/arrow/blob/712cfe6d84bd344cfe57a1e4c791f8a4d052c76d/python/pyarrow/includes/libarrow_dataset.pxd#L290
   
https://github.com/apache/arrow/blob/712cfe6d84bd344cfe57a1e4c791f8a4d052c76d/python/pyarrow/_dataset.pyx#L2492
   like the `arrow::dataset::Partitioning:Parse` method.
   
   Thanks in advance for any help or discussion!
   
   
   ### Component(s)
   
   Integration, Parquet, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Python / Cython interface to C++ arrow::dataset::Partitioning::Format [arrow]

Reply via email to