JacobWBradford opened a new issue, #44741:
URL: https://github.com/apache/arrow/issues/44741

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   There appears to be some ambiguity the `names` attribute on 
`pyarrow.parquet.ParquetSchema` vs `pyarrow.Schema` when it comes to nested 
logical type columns.
   
   I generated a parquet file with three columns, one containing list objects 
and the other two containing strings. When loading the two available schemas 
from `pyarrow.parquet.ParquetFile` (`schema` and `schema_arrow`), the 
corresponding `<schema>.names` differ for the List type column. `schema_arrow` 
appears to correctly list the column name, whereas `schema` lists the name of 
the lowest-level field in the structure (which, going off of the [Logical Types 
documentation 
](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists)would
 usually be `element`).
   
   I've included a simple example below, tested using pyarrow 17.0.0
   
   ```python
   import pandas as pd
   import pyarrow as pa
   import pyarrow.parquet as pq
   
   # Create parquet file
   filepath = `new_parquet_file.parquet`
   df = pd.DataFrame({'random_strings1': ["string1", "string2", "string3"],
                   'my_list': [[1, 2, 3], [4, 5, 6], [7, 8, 9]],
                   'random_strings2': ["string4", "string5", "string6"]})
   table = pa.Table.from_pandas(df)
   pq.write_table(table, filepath)
   
   # Examine parquet file schema
   with pq.ParquetFile(filepath) as file:
        print(file.schema.names)
        print(file.schema_arrow.names)
   ```
   ```
   ['random_strings1', 'element', 'random_strings2']
   ['random_strings1', 'my_list', 'random_strings2']
   ```
   
   Is this intended behavior from `pyarrow.parquet.ParquetSchema`? The 
inclusion of `schema_arrow` means it's still trivial to get the names of the 
columns, but it still poses a problem for those who aren't aware of the logic.
   
   ### Component(s)
   
   Documentation, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to