qzyu999 opened a new issue, #50131:
URL: https://github.com/apache/arrow/issues/50131

   ### Describe the enhancement requested
   
   ### **Component:**
   `Python`
   
   ### **Description:**
   
   The `arrow.parquet.variant` canonical extension type was introduced to C++ 
in GH-46104 (PR #45375) to represent semi-structured data using a two-buffer 
(metadata and value) struct layout. 
   
   This issue tracks exposing the C++ `VariantExtensionType` class, along with 
its custom `VariantArray` and `VariantScalar` representations, to PyArrow. 
   
   Exposing this type to Python will allow downstream projects to 
programmatically handle Variant schemas and data.
   
   #### **Proposed Changes:**
   
   1.  **Cython Declarations (`python/pyarrow/includes/libarrow.pxd`)**:
       *   Declare C++ `VariantExtensionType`, `VariantArray`, and 
`VariantScalar` from `arrow/extension/parquet_variant.h`.
   2.  **Type & Class Bindings (`python/pyarrow/types.pxi`, `public-api.pxi`)**:
       *   Expose `VariantType` inheriting from `BaseExtensionType`.
       *   Register the mapping for `"arrow.parquet.variant"` in 
`pyarrow_wrap_data_type`.
       *   Provide a Python-level factory constructor 
`pyarrow.variant(storage_type)`.
   3.  **Array & Scalar Bindings (`python/pyarrow/array.pxi`, 
`python/pyarrow/scalar.pxi`)**:
       *   Implement `VariantArray` inheriting from `ExtensionArray`.
       *   Implement `VariantScalar` inheriting from `ExtensionScalar`.
       *   Provide access to underlying elements (or dictionary 
representations) via `as_py()` / `to_pydict()`.
   4.  **Testing**:
       *   Add comprehensive tests in 
`python/pyarrow/tests/test_extension_type.py` validating registration, 
round-trip IPC, pickle compatibility, array building, and scalar inspection.
   
   #### **Dependencies:**
   
   Exposing core type metadata and storage wrapping does not require execution 
engines, but constructing arrays from Python objects and resolving/decoding 
values requires the C++ binary encoding/decoding implementations. 
   
   Therefore, this issue is blocked by:
   *   **GH-45946** (PR #50121): [C++][Parquet] Variant decoding
   *   **GH-45947** (PR #50122): [C++][Parquet] Variant encoding
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to