[I] [Python][Parquet] read_schema drops extension types (UUID returned as fixed_size_binary[16]) [arrow]

via GitHub Tue, 25 Nov 2025 08:43:22 -0800


Kuinox opened a new issue, #48254:
URL: https://github.com/apache/arrow/issues/48254


   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
    ### Summary
     UUID extension types are preserved in tables but dropped by 
`pyarrow.parquet.read_schema`, creating an asymmetry between the table’s schema 
and the schema read from Parquet metadata.
   
     ### Steps to Reproduce
     ```python
     import pyarrow as pa
     import pyarrow.parquet as pq
     from pathlib import Path
     import tempfile
   
     data = [
         b'\xe4`\xf9p\x83QGN\xac\x7f\xa4g>\x4b\xa8\xcb',
         b'\x1et\x14\x95\xee\xd5C\xea\x9b\xd7s\xdc\x91BK\xaf',
         None,
     ]
     table = pa.table([pa.array(data, type=pa.uuid())], names=["ext"])
     print("table schema type:", table.schema.field("ext").type)  # 
extension<arrow.uuid>
   
     path = Path(tempfile.gettempdir()) / "uuid_ext_test.parquet"
     pq.write_table(table, path, store_schema=False)
   
     print("read_schema type:", pq.read_schema(path).field("ext").type)
     print("read_table schema type:", 
pq.read_table(path).schema.field("ext").type)
   
     ### Expected Behavior
   
     read_schema(path) should yield the same type as the table schema (and 
read_table), i.e., extension<arrow.uuid>.
   
     ### Actual Behavior
   
     read_schema(path) returns fixed_size_binary[16], while the original 
table.schema and read_table(path).schema both report extension<arrow.uuid>, so 
metadata-based schema inspection drops the extension type.
   
     ### Notes
   
     - Observed with the current pyarrow wheel (22.0.0) and current main 
sources.
     - ParquetFile(...).schema_arrow preserves the extension type; read_schema 
does not
   
   ### Component(s)
   
   Python, Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Python][Parquet] read_schema drops extension types (UUID returned as fixed_size_binary[16]) [arrow]

Reply via email to