mhaseeb123 opened a new issue, #47336:
URL: https://github.com/apache/arrow/issues/47336

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   ## Short Description
   Arrow's parquet writer does not update the `unit` of timestamp columns in 
the written base64 encoded `ARROW::schema` that have been casted using the 
`coerce_timestamps`  argument. This leads to an inconsistent behavior/confusion 
when reading such a file as the two schemas (Parquet and ARROW) contain 
different timestamp units. 
   
   Arrow's parquet reader uses the timestamp units from the Parquet schema and 
only extracts timezone info from the ARROW::schema if available whereas the 
parquet reader in Polars directly uses the ARROW::schema when available 
ignoring the Parquet schema
   
   ## Expected Behavior
   Either the Arrow's parquet writer may also update the timestamp unit for the 
casted columns in the ARROW::schema for consistency OR the Arrow's parquet 
reader should override the unit from ARROW::schema over Parquet schema 
(faithful reconstruction of units + timezone?).
   
   ## Related Issues
   https://github.com/pola-rs/polars/issues/21392 and likely 
https://github.com/pola-rs/polars/issues/23949
   
   Thank you
   
   ### Component(s)
   
   C++, Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to