felixscherz commented on issue #716: URL: https://github.com/apache/iceberg-python/issues/716#issuecomment-2118806030
Hi, I investigated this a bit further and it seems to be related to the way the visitor works as @cgbur suggested. Here is what I tried: ```python def test_parquet_path_to_id_mapping(): # set field name to "item" pyarrow_list = pa.schema([ pa.field("extras", pa.list_(pa.field("item", pa.struct([pa.field("key", pa.string()), pa.field("value", pa.string())])))) ]) # this is called during table creation schema = Catalog._convert_schema_if_needed(pyarrow_list) mapping = parquet_path_to_id_mapping(schema) assert "extras.list.item.key" in mapping ``` The mapping that `Catalog._convert_schema_if_needed` creates looks like this: ```python {'extras.list.element.key': -1, 'extras.list.element.value': -1} ``` Looking into the visitor I found the method dealing with list types sets a default field name of "elements". https://github.com/apache/iceberg-python/blob/20c273104257f0a1ccd74a09f6d4601643115ffd/pyiceberg/io/pyarrow.py#L865-L870 https://github.com/apache/iceberg-python/blob/20c273104257f0a1ccd74a09f6d4601643115ffd/pyiceberg/io/pyarrow.py#L172 So we lose the information on the field name of the value field, setting it to "elements". Unfortunately I haven't found a way to access the field name as both [pyarrow.lib.ListType](https://arrow.apache.org/docs/python/generated/pyarrow.ListType.html) and [pyarrow.lib.DataType](https://arrow.apache.org/docs/python/generated/pyarrow.DataType.html#pyarrow.DataType) don't seem to make that available. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org