berg2043 opened a new issue, #2263:
URL: https://github.com/apache/iceberg-python/issues/2263
### Apache Iceberg version
0.9.1 (latest release)
### Please describe the bug 🐞
After applying the fix from #1983 to fix decimal conversion, "conversion
from NoneType to Decimal is not supported" is thrown if a decimal column is
empty. Here's a snippet of code to replicate
```
from decimal import Decimal
import pyarrow as pa
from pyiceberg.io.pyarrow import pyarrow_to_schema
from pyiceberg.schema import Schema
from pyiceberg.types import DecimalType, NestedField
from pyiceberg.catalog import Catalog, load_catalog
from pyiceberg.table.name_mapping import MappedField, NameMapping
warehouse_path = '/home/pberg/Development/pyiceberg_debug'
catalog = load_catalog(
"default",
type = "sql",
uri = f"sqlite://///{warehouse_path}/test",
warehouse = f'file://{warehouse_path}',
)
catalog.create_namespace_if_not_exists(
'test',
{'loacation': f'file://{warehouse_path}'}
)
decimal8 = pa.array([Decimal("123.45"), Decimal("678.91")], pa.decimal128(8,
2))
decimal16 = pa.array([Decimal("12345679.123456"),
Decimal("67891234.678912")], pa.decimal128(16, 6))
decimal19 = pa.array([Decimal("1234567890123.123456"),
Decimal("9876543210703.654321")], pa.decimal128(19, 6))
empty_decimal8 = pa.array([None, None], pa.decimal128(8,2))
empty_decimal16 = pa.array([None, None], pa.decimal128(16, 6))
empty_decimal19 = pa.array([None, None], pa.decimal128(19, 6))
table = pa.Table.from_pydict(
{
"decimal8": decimal8,
"decimal16": decimal16,
"decimal19": decimal19,
"empty_decimal8": empty_decimal8,
"empty_decimal16": empty_decimal16,
"empty_decimal19": empty_decimal19,
},
)
pa_schema = table.schema
name_mapping = NameMapping([
MappedField(**{'field-id': i+1, 'names': [name]})
for i, name
in enumerate(pa_schema.names)
])
schema = pyarrow_to_schema(
pa_schema,
name_mapping
)
pyiceberg_table = catalog.create_table(
'test.decimals',
schema=table.schema,
)
pyiceberg_table.append(table)
```
My current fix to data_file_statistics_from_parquet_metadata is as follows,
but I'm unsure what the unintended consequences would be.
```
if isinstance(stats_col.iceberg_type, DecimalType) and
statistics.physical_type != "FIXED_LEN_BYTE_ARRAY":
scale = stats_col.iceberg_type.scale
if statistics.min_raw:
col_aggs[field_id].update_min(unscaled_to_decimal(statistics.min_raw, scale))
if statistics.max_raw:
col_aggs[field_id].update_max(unscaled_to_decimal(statistics.max_raw, scale))
```
I could not get the nightly build to install, so I'm unsure if this still
exists.
### Willingness to contribute
- [ ] I can contribute a fix for this bug independently
- [x] I would be willing to contribute a fix for this bug with guidance from
the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]