glitchy opened a new pull request, #2655:
URL: https://github.com/apache/iceberg-rust/pull/2655

   ## Which issue does this PR close?
   
   Closes #2654
   
   `ParquetWriter` matches record batch columns to Iceberg fields by field id, 
reading the `PARQUET:field_id` metadata off each Arrow field. When a caller 
hands it a record batch whose Arrow schema was built by hand--without that 
metadata--the write fails deep in value extraction with an opaque `Field id N 
not found in struct array`, which points at the symptom (`arrow/value.rs`) 
rather than the cause.
   
   This fails fast at the writer boundary instead. When matching by field id 
(`FieldMatchMode::Id`), the incoming record batch's Arrow schema is validated 
on the first write and a clear `DataInvalid` error is returned naming the 
field(s) missing `PARQUET:field_id`, with a pointer to derive the schema via 
`current_schema().as_ref().try_into()`.
   
   - Purely additive--schemas built the right way are unaffected; only 
malformed hand-built schemas now fail early with an actionable message.
   - Recurses into nested struct/list fields. Skips the Arrow map `entries` 
wrapper, which has no Iceberg field id of its own (only its key/value do).
   
   Reported by @malon64 while testing #2185 from a downstream Rust ingestion 
tool.
   
   ## Are these changes tested?
   
   New unit test `test_parquet_writer_rejects_schema_without_field_ids` in 
`writer::file_writer::parquet_writer::tests`: builds a record batch whose Arrow 
schema lacks `PARQUET:field_id` and asserts the write fails with a 
`DataInvalid` error naming the missing metadata key. Existing writer 
tests--including the complex/map schema test--continue to pass.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to