[I] Decimal unscale fails with empty column [iceberg-python]

via GitHub Wed, 30 Jul 2025 17:38:33 -0700


berg2043 opened a new issue, #2263:
URL: https://github.com/apache/iceberg-python/issues/2263


   ### Apache Iceberg version
   
   0.9.1 (latest release)
   
   ### Please describe the bug 🐞
   
   After applying the fix from #1983 to fix decimal conversion, "conversion 
from NoneType to Decimal is not supported" is thrown if a decimal column is 
empty.  Here's a snippet of code to replicate
   
   ```
   from decimal import Decimal
   
   import pyarrow as pa
   from pyiceberg.io.pyarrow import pyarrow_to_schema
   from pyiceberg.schema import Schema
   from pyiceberg.types import DecimalType, NestedField
   from pyiceberg.catalog import Catalog, load_catalog
   from pyiceberg.table.name_mapping import MappedField, NameMapping
   
   
   warehouse_path = '/home/pberg/Development/pyiceberg_debug'
   
   catalog = load_catalog(
       "default",
       type = "sql",
       uri = f"sqlite://///{warehouse_path}/test",
       warehouse = f'file://{warehouse_path}',
   )
   
   catalog.create_namespace_if_not_exists(
     'test',
     {'loacation': f'file://{warehouse_path}'}
   )
   
   decimal8 = pa.array([Decimal("123.45"), Decimal("678.91")], pa.decimal128(8, 
2))
   decimal16 = pa.array([Decimal("12345679.123456"), 
Decimal("67891234.678912")], pa.decimal128(16, 6))
   decimal19 = pa.array([Decimal("1234567890123.123456"), 
Decimal("9876543210703.654321")], pa.decimal128(19, 6))
   empty_decimal8 = pa.array([None, None], pa.decimal128(8,2))
   empty_decimal16 = pa.array([None, None], pa.decimal128(16, 6))
   empty_decimal19 = pa.array([None, None], pa.decimal128(19, 6))
   
   table = pa.Table.from_pydict(
       {
           "decimal8": decimal8,
           "decimal16": decimal16,
           "decimal19": decimal19,
           "empty_decimal8": empty_decimal8,
           "empty_decimal16": empty_decimal16,
           "empty_decimal19": empty_decimal19,
       },
   )
   
   pa_schema = table.schema
   
   name_mapping = NameMapping([
     MappedField(**{'field-id': i+1, 'names': [name]})
     for i, name
     in enumerate(pa_schema.names)
   ])
   
   schema = pyarrow_to_schema(
     pa_schema,
     name_mapping
   )
   
   pyiceberg_table = catalog.create_table(
     'test.decimals',
     schema=table.schema,
   )
   
   pyiceberg_table.append(table)
   ```
   
   My current fix to data_file_statistics_from_parquet_metadata is as follows, 
but I'm unsure what the unintended consequences would be.
   ```
                       if isinstance(stats_col.iceberg_type, DecimalType) and 
statistics.physical_type != "FIXED_LEN_BYTE_ARRAY":
                           scale = stats_col.iceberg_type.scale
                           if statistics.min_raw:
                               
col_aggs[field_id].update_min(unscaled_to_decimal(statistics.min_raw, scale))
                           if statistics.max_raw:
                               
col_aggs[field_id].update_max(unscaled_to_decimal(statistics.max_raw, scale))
   ```
   
   I could not get the nightly build to install, so I'm unsure if this still 
exists.
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [x] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Decimal unscale fails with empty column [iceberg-python]

Reply via email to