qzyu999 opened a new issue, #50132:
URL: https://github.com/apache/arrow/issues/50132

   ### Describe the enhancement requested
   
   ### **Description:**
   
   The C++ schema mapping between Parquet's `VARIANT` logical type and Arrow's 
`VariantExtensionType` (`arrow.parquet.variant`) is established by GH-46104. 
With the C++ Variant encoder (GH-45947) and decoder (GH-45946) implementations, 
the underlying C++ Parquet reader/writer will be able to process actual data 
payloads.
   
   This issue tracks Python-level Parquet integration and testing to ensure 
Python users can read and write Parquet files containing Variant columns 
seamlessly.
   
   #### **Proposed Changes:**
   
   1.  **Parquet Read/Write Pipeline Validation**:
       *   Ensure that `pyarrow.parquet.write_table` correctly serializes 
`VariantExtensionType` columns into Parquet files with the `VARIANT` logical 
type annotation.
       *   Ensure that `pyarrow.parquet.read_table` correctly deserializes 
Parquet `VARIANT` columns back into PyArrow `VariantArray` columns (rather than 
falling back to the raw binary-pair storage struct or throwing an unsupported 
type exception).
   2.  **Metadata and Schema Inspection**:
       *   Verify that `pyarrow.parquet.read_schema` and `ParquetFile.schema` 
correctly report the column type as the `VariantExtensionType` extension type.
   3.  **Integration Testing**:
       *   Add end-to-end tests in 
`python/pyarrow/tests/parquet/test_data_types.py` (or a dedicated integration 
test suite):
           *   Construct a `pyarrow.Table` containing a `VariantArray` (e.g., 
from nested dictionaries/lists).
           *   Write it to a file using `pyarrow.parquet.write_table`.
           *   Read the file back using `pyarrow.parquet.read_table` and assert 
that the types and values are identical to the original table.
           *   Test reading a reference Parquet file containing Variant data 
written by a different implementation (e.g., Go or Spark) to verify 
cross-language compatibility.
   
   #### **Dependencies:**
   
   This issue is blocked by:
   *   **GH-50131**: [Python] Bindings for Variant canonical extension type 
(Exposing the Python `VariantType`, `VariantArray`, and `VariantScalar` classes)
   *   **GH-45946** (PR #50121): [C++][Parquet] Variant decoding
   *   **GH-45947** (PR #50122): [C++][Parquet] Variant encoding
   
   ### Component(s)
   
   Python, Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to