syun64 commented on issue #936: URL: https://github.com/apache/iceberg-python/issues/936#issuecomment-2230873055
Hi @HonahX thank you for raising this issue! Having worked on type casting PRs recently, this one piqued my interest... It looks like there was a [PR merged recently](https://github.com/apache/arrow/pull/42169) that exposed the feature from C++ Arrow to the Python bindings through a flag `store_decimal_as_integer`, which was released in version 17.0.0: ``` >>> import pyarrow as pa >>> schema = pa.schema([pa.field('decimal', pa.decimal128(precision=2))]) >>> table = pa.Table.from_pydict({"decimal": [1.1, 2.2]}) >>> table.cast(schema) pyarrow.Table decimal: decimal128(2, 0) ---- decimal: [[1,2]] >>> import pyarrow.parquet as pq >>> pq.write_table(table.cast(schema), "test.parquet", store_decimal_as_integer=True) >>> pq.read_metadata("test.parquet") <pyarrow._parquet.FileMetaData object at 0x730658de3600> created_by: parquet-cpp-arrow version 17.0.0 num_columns: 1 num_rows: 2 num_row_groups: 1 format_version: 2.6 serialized_size: 377 >>> pq.read_metadata("test.parquet").schema <pyarrow._parquet.ParquetSchema object at 0x730656702200> required group field_id=-1 schema { optional int32 field_id=-1 decimal (Decimal(precision=2, scale=0)); ``` I just checked using the latest release `17.0.0` and I've confirmed that the parquet phyiscal types are being written as Integers -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org