alex-d-jensen opened a new issue, #45640:
URL: https://github.com/apache/arrow/issues/45640

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   ## Sample code to demonstrate:
   ```python
   import pandas as pd
   import pyarrow as pa
   
   pandas_dataframe = pd.DataFrame(
       {
           "x_simple_col": [123],
           "struct_col": [
               {
                   "col9": "a_string",
                   "col1": True,
                   "a_nested_struct": {
                       "field": 1,
                       'a_field': 2
                   },
                   "b_array": ["cheese"],
               },
           ],
       }
   )
   
   
   pyarrow_schema = pa.schema(
       fields=[
           pa.field("x_simple_col", pa.int64()),
           pa.field(
               name="struct_col",
               type=pa.struct(
                   fields=[
                       pa.field(name="col9", type=pa.string()),
                       pa.field(name="col1", type=pa.bool_()),
                       pa.field(
                           name="a_nested_struct",
                           type=pa.struct(
                               fields=[
                                   pa.field(name="field", type=pa.int64()),
                                   pa.field(name="a_field", type=pa.int64()),
                               ]
                           ),
                       ),
                       pa.field(name="b_array", 
type=pa.list_(value_type=pa.string())),
                   ]
               ),
           ),
       ]
   )
   
   inferred_schema = pa.Schema.from_pandas(pandas_dataframe)
   
   assert pyarrow_schema == inferred_schema
   
   pyarrow_schema
   inferred_schema
   
   ```
   
   Gives output:
   ```
   >>> assert pyarrow_schema == inferred_schema
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   AssertionError
   >>> 
   >>> pyarrow_schema
   x_simple_col: int64
   struct_col: struct<col9: string, col1: bool, a_nested_struct: struct<field: 
int64, a_field: int64>, b_array: list<item: string>>
     child 0, col9: string
     child 1, col1: bool
     child 2, a_nested_struct: struct<field: int64, a_field: int64>
         child 0, field: int64
         child 1, a_field: int64
     child 3, b_array: list<item: string>
         child 0, item: string
   >>> inferred_schema
   x_simple_col: int64
   struct_col: struct<a_nested_struct: struct<a_field: int64, field: int64>, 
b_array: list<item: string>, col1: bool, col9: string>
     child 0, a_nested_struct: struct<a_field: int64, field: int64>
         child 0, a_field: int64
         child 1, field: int64
     child 1, b_array: list<item: string>
         child 0, item: string
     child 2, col1: bool
     child 3, col9: string
   -- schema metadata --
   pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 
499
   ```
   
   ## Expected result:
   
   inferred_schema and pyarrow_schema should match, including struct fields 
order (given that documentation for structs mentions that fields are ordered 
and that order matters when comparing schemas).
   
   ## Actual result:
   Schema for structs (including nested structs/fields in structs inside 
structs etc) has fields in alphabetical order, rather than in the order found 
the data which the schema is inferred from via `from_pandas`.
   Regular columns stay in given order - this only affects fields in structs.
   
   ## System info:
   output from sw_vers:
   ProductName:            macOS
   ProductVersion:         15.3.1
   BuildVersion:           24D70
   
   pyarrow version: 18.1.0 (also tried on 19.0.1).
   pandas version: 2.2.3
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to