cgbur commented on issue #716:
URL: https://github.com/apache/iceberg-python/issues/716#issuecomment-2101251170

   Ah, confusingly there appears to be writer differences that cause the issue. 
My Rust pyarrow implementation matches when polars has `pyarrow=True`.
   
   ```python
   import polars as pl
   import pyarrow.parquet as pq
   
   df = pl.DataFrame(
       {
           "a": [[{"a": 1}, {"a": 2}], [{"a": 3}]],
       }
   )
   
   
   def print_schema_path(path, col_name):
       metadata = pq.read_metadata(path)
       for group_number in range(metadata.num_row_groups):
           row_group = metadata.row_group(group_number)
           for column_number in range(row_group.num_columns):
               column = row_group.column(column_number)
               if column.path_in_schema.startswith(col_name):
                   print(f"path_in_schema: {column.path_in_schema}")
   
   
   df.write_parquet("example.parquet", use_pyarrow=False)
   print("with polars")
   print(pq.read_schema("example.parquet"))
   print_schema_path("example.parquet", "a")
   df.write_parquet("example.parquet", use_pyarrow=True)
   print("with pyarrow")
   print(pq.read_schema("example.parquet"))
   print_schema_path("example.parquet", "a")
   ```
   
   ```
   with polars
   a: large_list<item: struct<a: int64>>
     child 0, item: struct<a: int64>
         child 0, a: int64
   path_in_schema: a.list.item.a
   with pyarrow
   a: large_list<element: struct<a: int64>>
     child 0, element: struct<a: int64>
         child 0, a: int64
   path_in_schema: a.list.element.a
   ```
   
   Perhaps the visitor is not respecting the name used in the schema? Or there 
is a mismatch in the method used to acquire between the iceberg and parquet 
change? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to