glitchy opened a new pull request, #2655: URL: https://github.com/apache/iceberg-rust/pull/2655
## Which issue does this PR close? Closes #2654 `ParquetWriter` matches record batch columns to Iceberg fields by field id, reading the `PARQUET:field_id` metadata off each Arrow field. When a caller hands it a record batch whose Arrow schema was built by hand--without that metadata--the write fails deep in value extraction with an opaque `Field id N not found in struct array`, which points at the symptom (`arrow/value.rs`) rather than the cause. This fails fast at the writer boundary instead. When matching by field id (`FieldMatchMode::Id`), the incoming record batch's Arrow schema is validated on the first write and a clear `DataInvalid` error is returned naming the field(s) missing `PARQUET:field_id`, with a pointer to derive the schema via `current_schema().as_ref().try_into()`. - Purely additive--schemas built the right way are unaffected; only malformed hand-built schemas now fail early with an actionable message. - Recurses into nested struct/list fields. Skips the Arrow map `entries` wrapper, which has no Iceberg field id of its own (only its key/value do). Reported by @malon64 while testing #2185 from a downstream Rust ingestion tool. ## Are these changes tested? New unit test `test_parquet_writer_rejects_schema_without_field_ids` in `writer::file_writer::parquet_writer::tests`: builds a record batch whose Arrow schema lacks `PARQUET:field_id` and asserts the write fails with a `DataInvalid` error naming the missing metadata key. Existing writer tests--including the complex/map schema test--continue to pass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
