mhaseeb123 opened a new issue, #47336: URL: https://github.com/apache/arrow/issues/47336
### Describe the bug, including details regarding any error messages, version, and platform. ## Short Description Arrow's parquet writer does not update the `unit` of timestamp columns in the written base64 encoded `ARROW::schema` that have been casted using the `coerce_timestamps` argument. This leads to an inconsistent behavior/confusion when reading such a file as the two schemas (Parquet and ARROW) contain different timestamp units. Arrow's parquet reader uses the timestamp units from the Parquet schema and only extracts timezone info from the ARROW::schema if available whereas the parquet reader in Polars directly uses the ARROW::schema when available ignoring the Parquet schema ## Expected Behavior Either the Arrow's parquet writer may also update the timestamp unit for the casted columns in the ARROW::schema for consistency OR the Arrow's parquet reader should override the unit from ARROW::schema over Parquet schema (faithful reconstruction of units + timezone?). ## Related Issues https://github.com/pola-rs/polars/issues/21392 and likely https://github.com/pola-rs/polars/issues/23949 Thank you ### Component(s) C++, Parquet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
