rshkv commented on PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2613020929

   I'm still working on this and will continue working on this. I find it quite 
difficult to get Arrow and Iceberg to agree on types because of field ids.
   
   The Iceberg schema requires that all fields have a field id, which in the 
converted Arrow schema becomes type metadata. But when constructing 
`StructArray` or `RecordBatch`, Arrow checks that the schema matches that of 
the data 
[(e.g.)](https://github.com/apache/arrow-rs/blob/0c07ec79cd4b28e7aa9d15d1d58b5c5adafb6855/arrow-array/src/record_batch.rs#L333).
 And they tend to _not_ match because the schema has field ids and the data 
does not. 
   
   E.g., with `MapBuilder`, we have 
[`with_values_field`](https://github.com/apache/arrow-rs/blob/0c07ec79cd4b28e7aa9d15d1d58b5c5adafb6855/arrow-array/src/builder/map_builder.rs#L114-L125)
 to pass in a field with a field id in metadata. However there is no 
`with_keys_field` equivalent. Yet, the key field in the Iceberg schema must 
have a field id.
   
   @liurenjie1024, I'd like to see this finished and some clarity on designs 
you'd merge would help. Here are some questions:
   * We can make `schema()` return an Iceberg schema with field id, but how 
important is it that a returned `RecordBatch` has field ids in metadata? Are we 
ok with _not_ having field ids on `RecordBatch#schema` but only on 
`MetadataTable#schema`?
   * If we _do_ need record batches types to have those field ids, we might 
need changes in arrow-rs to express something like _"a `RecordBatch` or 
`StructArray` may have fields with metadata, but the respective types of the 
underlying `ArrayData` _don't_ need to match metadata"_.
   
   Let me know what you think or if I'm not being clear. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to