CaptainEureka opened a new issue, #2057: URL: https://github.com/apache/iceberg-python/issues/2057
### Apache Iceberg version 0.9.1 (latest release) ### Please describe the bug 🐞 When attempting to add Parquet files to an Iceberg table using `Table.add_files`, the operation fails if a column defined as `DecimalType` in the Iceberg schema is physically stored as `FIXED_LEN_BYTE_ARRAY` in the Parquet file, *even if* the decimal's precision would typically map to `INT32` or `INT64` according to Iceberg's preferred Parquet mapping. I see in the Iceberg Spec that on-write the mapping is correct. However, the current behaviour seems to overly restrict the physical Parquet type for decimals during the file addition process. I believe this greatly limits the *kinds* of parquet files that can be "added" to an Iceberg table this way. **Steps to Reproduce:** 1. Define an Iceberg table schema with a `DecimalType` column, for example, `Decimal(10, 2)`. * Iceberg's preferred Parquet physical type for `Decimal(10, 2)` would be `INT64`. 2. Create a Parquet file where the corresponding column for this `Decimal(10, 2)` is physically stored as `FIXED_LEN_BYTE_ARRAY`. The data itself is valid for `Decimal(10, 2)`. 3. Attempt to add this Parquet file to the Iceberg table using `Table.add_files`. **Behavior:** The `Table.add_files` operation fails, with the following error: ```sh ValueError: Unexpected physical type FIXED_LEN_BYTE_ARRAY for DecimalType(10, 2) expected INT32 ``` indicating a mismatch between the expected physical type (e.g., `INT64`) and the actual physical type (`FIXED_LEN_BYTE_ARRAY`) found in the Parquet file for the decimal column. **Expected Behavior:** The `Table.add_files` operation should succeed and correctly read the decimal values from the `FIXED_LEN_BYTE_ARRAY` physical storage. The Iceberg reader/writer should be lenient with the physical storage format of decimals OR otherwise `Table.add_files` should document these limitations. **Environment:** * Python version: 3.12.9 * Parquet library and version: pyarrow 20.0.0 P.S. If this is just user error and I shouldn't be trying to do things this way I'd be happy to hear alternatives. ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [x] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org