rshkv commented on PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2613020929
I'm still working on this and will continue working on this. I find it quite difficult to get Arrow and Iceberg to agree on types because of field ids. The Iceberg schema requires that all fields have a field id, which in the converted Arrow schema becomes type metadata. But when constructing `StructArray` or `RecordBatch`, Arrow checks that the schema matches that of the data [(e.g.)](https://github.com/apache/arrow-rs/blob/0c07ec79cd4b28e7aa9d15d1d58b5c5adafb6855/arrow-array/src/record_batch.rs#L333). And they tend to _not_ match because the schema has field ids and the data does not. E.g., with `MapBuilder`, we have [`with_values_field`](https://github.com/apache/arrow-rs/blob/0c07ec79cd4b28e7aa9d15d1d58b5c5adafb6855/arrow-array/src/builder/map_builder.rs#L114-L125) to pass in a field with a field id in metadata. However there is no `with_keys_field` equivalent. Yet, the key field in the Iceberg schema must have a field id. @liurenjie1024, I'd like to see this finished and some clarity on designs you'd merge would help. Here are some questions: * We can make `schema()` return an Iceberg schema with field id, but how important is it that a returned `RecordBatch` has field ids in metadata? Are we ok with _not_ having field ids on `RecordBatch#schema` but only on `MetadataTable#schema`? * If we _do_ need record batches types to have those field ids, we might need changes in arrow-rs to express something like _"a `RecordBatch` or `StructArray` may have fields with metadata, but the respective types of the underlying `ArrayData` _don't_ need to match metadata"_. Let me know what you think or if I'm not being clear. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org