HonahX opened a new issue, #936:
URL: https://github.com/apache/iceberg-python/issues/936

   ### Apache Iceberg version
   
   main (development)
   
   ### Please describe the bug 🐞
   
   According to the [parquet data type mappings 
spec](https://iceberg.apache.org/spec/#parquet). `DecimalType` should map to 
`INT32` when `precision <= 9`, `INT64` when `precision <= 18`, and `fixed` 
otherwise.
   
   However, currently arrow write all decimal type as `fixed` in parquet. This 
may not be a big issue since the logical type is correct and may require 
upstream support:
   
   - https://github.com/apache/arrow/issues/38882
   
   Updated: Thanks @syun64 for providing the link of upstream PR that fix this 
   
   - https://github.com/apache/arrow/pull/42169
   
   
   Simple test:
   ```python
   from pyiceberg.catalog import load_catalog
   from pyiceberg.types import *
   from pyiceberg.schema import *
   import pyarrow as pa
   
   rest_catalog = load_catalog(
       "rest",
       **{
           ...
       },
   )
   
   
   decimal_schema = Schema(NestedField(1, "decimal", DecimalType(7, 0)))
   decimal_arrow_schema = pa.schema(
       [
           ("decimal", pa.decimal128(7, 0)),
       ]
   )
   
   decimal_arrow_table = pa.Table.from_pylist(
       [
           {
               "decimal": 123,
           }
       ],
       schema=decimal_arrow_schema,
   )
   
   tbl = rest_catalog.create_table(
       "pyiceberg_test.test_decimal_type", schema=decimal_arrow_schema
   )
   
   tbl.append(decimal_arrow_table)
   
   ```
   ```
   > parquet-tools inspect 00000-0-bff20a80-0e80-4b53-ba35-2c94498fa507.parquet
   
   ############ file meta data ############
   created_by: parquet-cpp-arrow version 16.1.0
   num_columns: 1
   num_rows: 1
   num_row_groups: 1
   format_version: 2.6
   serialized_size: 465
   
   
   ############ Columns ############
   decimal
   
   ############ Column(decimal) ############
   name: decimal
   path: decimal
   max_definition_level: 1
   max_repetition_level: 0
   physical_type: FIXED_LEN_BYTE_ARRAY
   logical_type: Decimal(precision=7, scale=0)
   converted_type (legacy): DECIMAL
   compression: ZSTD (space_saved: -25%)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to