jhostetler opened a new issue, #34250:
URL: https://github.com/apache/arrow/issues/34250

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Code example:
   ```
   >>> import pyarrow
   >>> pyarrow.RecordBatch.from_pylist([{"start": 0, "end": 1, "tag": 
"foo"}]).schema
   start: int64
   end: int64
   tag: string
   >>> pyarrow.RecordBatch.from_pylist([{"spans": [{"start": 0, "end": 1, 
"tag": "foo"}]}]).schema
   spans: list<item: struct<end: int64, start: int64, tag: string>>
     child 0, item: struct<end: int64, start: int64, tag: string>
         child 0, end: int64
         child 1, start: int64
         child 2, tag: string
   ```
   
   In the 1st schema, the fields of the `struct` are in the same order as the 
keys in the input dictionary. In the 2nd schema, where the `struct` is nested 
inside a `list`, the fields of the `struct` have been sorted by name. I would 
expect the ordering to always be the same order as in the input (like in the 
1st schema). The more general principle would be that the input -- or at least 
the first row of input that's used for schema inference -- should validate 
against the inferred schema.
   
   I suspect this behavior is related to `from_pylist()` accepting lists where 
the elements are dictionaries with different key sets, such as:
   ```
   >>> pyarrow.RecordBatch.from_pylist([{"spans": [{"start": 0, "end": 1, 
"tag": "foo"}, {"new": 42}]}]).schema
   spans: list<item: struct<end: int64, new: int64, start: int64, tag: string>>
     child 0, item: struct<end: int64, new: int64, start: int64, tag: string>
         child 0, end: int64
         child 1, new: int64
         child 2, start: int64
         child 3, tag: string
   ```
   In this case I'm guessing it sorts the fields because it needs to come up 
with a canonical ordering. It seems to me that it should just fail to infer a 
schema here, since again the inputs are not valid according to the inferred 
schema.
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to