felixscherz commented on issue #716:
URL: https://github.com/apache/iceberg-python/issues/716#issuecomment-2118806030

   Hi, I investigated this a bit further and it seems to be related to the way 
the visitor works as @cgbur suggested. Here is what I tried:
   
   ```python
   def test_parquet_path_to_id_mapping():
       # set field name to "item"
       pyarrow_list = pa.schema([
               pa.field("extras", pa.list_(pa.field("item", 
pa.struct([pa.field("key", pa.string()), pa.field("value", pa.string())]))))
           ])
      
       # this is called during table creation
       schema = Catalog._convert_schema_if_needed(pyarrow_list)
   
       mapping = parquet_path_to_id_mapping(schema)
       assert "extras.list.item.key" in mapping
   ```
   The mapping that `Catalog._convert_schema_if_needed` creates looks like this:
   ```python
   {'extras.list.element.key': -1, 'extras.list.element.value': -1}
   ```
   
   Looking into the visitor I found the method dealing with list types sets a 
default field name of "elements". 
   
https://github.com/apache/iceberg-python/blob/20c273104257f0a1ccd74a09f6d4601643115ffd/pyiceberg/io/pyarrow.py#L865-L870
   
https://github.com/apache/iceberg-python/blob/20c273104257f0a1ccd74a09f6d4601643115ffd/pyiceberg/io/pyarrow.py#L172
   
   So we lose the information on the field name of the value field, setting it 
to "elements".
   
   Unfortunately I haven't found a way to access the field name as both 
[pyarrow.lib.ListType](https://arrow.apache.org/docs/python/generated/pyarrow.ListType.html)
 and 
[pyarrow.lib.DataType](https://arrow.apache.org/docs/python/generated/pyarrow.DataType.html#pyarrow.DataType)
 don't seem to make that available.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to