jordepic opened a new pull request, #2647: URL: https://github.com/apache/iceberg-rust/pull/2647
- Closes #2617 . ## What changes are included in this PR? When a table's struct (or nested list/map) column has gained fields over time via schema evolution, reading data files written under the older schema fails with an Arrow cast error such as Cast error: Casting from Utf8 to Struct(...). The record-batch transformer reconciles a file's nested children to the table schema by position within the struct rather than by Iceberg field id, so once a nested struct adds a field, the children no longer line up and a mismatched cast is attempted (e.g. casting a string child into a struct slot). Files are valid and readable by Iceberg-Java/Spark. e.g. struct goes from a, c to a, b, c -> when reading old file with only a, c it tries to cast c to type of b This change fixes the bug! Replace the flat cast with promote_array_to_target, which walks the target type and matches nested struct children by PARQUET:field_id, filling fields absent from the file with typed NULLs and recursing through list/large-list/map. Primitives still use cast for valid Iceberg promotions. Mirrors iceberg-java's by-field-id nested readers. ## Are these changes tested? Yes, unit tests are included to ensure that nested fields are now properly reconciled when present in the schema but not in the data file itself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
