cgbur opened a new issue, #716: URL: https://github.com/apache/iceberg-python/issues/716
### Apache Iceberg version main (development) ### Please describe the bug 🐞 When using the `add_files` table api, the parquet metadata needs to be read and a mapping of `Dict[str, int]` is used by [`data_file_statistics_from_parquet_metadata`](https://github.com/apache/iceberg-python/blob/main/pyiceberg/io/pyarrow.py#L1670) in order to link the field ID to the name in the parquet file for statistics collection. However during [the mapping lookup](https://github.com/apache/iceberg-python/blob/main/pyiceberg/io/pyarrow.py#L1727) I was receiving an error that a key was not present. My schema contains one of the following (its a subfield of a `Details` struct which is important for the full name later): ``` extras: large_list<item: struct<key: string not null, value: string>> not null child 0, item: struct<key: string not null, value: string> child 0, key: string not null child 1, value: string ``` Which based on the parquet schema path definition has a path of: ``` Details.extras.list.item.key Details.extras.list.item.value ``` The issue is that the [`parquet_path_to_id_mapping`](https://github.com/apache/iceberg-python/blob/main/pyiceberg/io/pyarrow.py#L1587) returns a mapping for these two fields as follows: ``` Details.extras.list.element.key -> 189 Details.extras.list.element.value -> 190 ``` So, the issue appears to be that the visitor for constructing the schema paths is incorrectly using `element` instead of `item` as expected in the parquet schema paths. I am not sure how this manifests yet, as I have not dug into it too closely. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org