liurenjie1024 commented on issue #1560: URL: https://github.com/apache/iceberg-rust/issues/1560#issuecomment-3149922656
> Hi [@liurenjie1024](https://github.com/liurenjie1024) , thanks for the suggestion! However, I'm not sure how passing `ArrowSchema` to `ArrowArrayAccessor` can free us from matching fields via `field.name`. Could you please elaborate more? > > My understanding is that even if we convert the schema to `ArrowSchema`, the schema still won't have `PARQUET:field_id` for the `insert_into` case, and we will need to match using `field.name` Hi, @CTTY When you convert iceberg schema to arrow schema, we will insert the `PARQUET:field_id` metadata, see https://github.com/apache/iceberg-rust/blob/fbc3716c7eac6bba6f1902610407e82e925a83ba/crates/iceberg/src/arrow/schema.rs#L466 . But I'm questioning the necessity of id matching or even name matching in this case. From a user's point of view, they just need to ensure that the passed in arrow array matches iceberg's schema, e.g. type matches. They don't need to care about name or ids. I think a better approach is to match array just by order and array type? For example, when the writer's iceberg schema is following: ``` { id int, name string, address string } ``` The user is expected to pass record batches of three arrays: ``` int array string array address array ``` They should follow order of iceberg schema, and type should matches. This requirements seems more user friendly to me, what do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
