jordepic opened a new pull request, #2647:
URL: https://github.com/apache/iceberg-rust/pull/2647

   - Closes #2617 .
   
   ## What changes are included in this PR?
   
   When a table's struct (or nested list/map) column has gained fields over 
time via schema evolution, reading data files written under the older schema 
fails with an Arrow cast error such as Cast error: Casting from Utf8 to 
Struct(...). The record-batch transformer reconciles a file's nested children 
to the table schema by position within the struct rather than by Iceberg field 
id, so once a nested struct adds a field, the children no longer line up and a 
mismatched cast is attempted (e.g. casting a string child into a struct slot). 
Files are valid and readable by Iceberg-Java/Spark.
   
   e.g. struct goes from a, c to a, b, c -> when reading old file with only a, 
c it tries to cast c to type of b
   
   This change fixes the bug!
   
   Replace the flat cast with promote_array_to_target, which walks the target 
type and matches nested struct children by PARQUET:field_id, filling fields 
absent from the file with typed NULLs and recursing through 
list/large-list/map. Primitives still use cast for valid Iceberg promotions. 
Mirrors iceberg-java's by-field-id nested readers.
   
   ## Are these changes tested?
   
   Yes, unit tests are included to ensure that nested fields are now properly 
reconciled when present in the schema but not in the data file itself.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to