Re: [I] Need to use field.name to determine arrow field's position when PARQUET:field_id is unavailable [iceberg-rust]

via GitHub Mon, 04 Aug 2025 03:04:51 -0700


liurenjie1024 commented on issue #1560:
URL: https://github.com/apache/iceberg-rust/issues/1560#issuecomment-3149922656

> Hi [@liurenjie1024](https://github.com/liurenjie1024) , thanks for the
suggestion! However, I'm not sure how passing `ArrowSchema` to
`ArrowArrayAccessor` can free us from matching fields via `field.name`. Could
you please elaborate more?
>
> My understanding is that even if we convert the schema to `ArrowSchema`,
the schema still won't have `PARQUET:field_id` for the `insert_into` case, and
we will need to match using `field.name`

Hi, @CTTY When you convert iceberg schema to arrow schema, we will insert
the `PARQUET:field_id` metadata, see
https://github.com/apache/iceberg-rust/blob/fbc3716c7eac6bba6f1902610407e82e925a83ba/crates/iceberg/src/arrow/schema.rs#L466
.

But I'm questioning the necessity of id matching or even name matching in
this case. From a user's point of view, they just need to ensure that the
passed in arrow array matches iceberg's schema, e.g. type matches. They don't
need to care about name or ids. I think a better approach is to match array
just by order and array type? For example, when the writer's iceberg schema is
following:
```
{
id int,
name string,
address string
}
```

The user is expected to pass record batches of three arrays:
```
int array
string array
address array
```

They should follow order of iceberg schema, and type should matches. This
requirements seems more user friendly to me, what do you think?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Need to use field.name to determine arrow field's position when PARQUET:field_id is unavailable [iceberg-rust]

Reply via email to