jhostetler opened a new issue, #34250:
URL: https://github.com/apache/arrow/issues/34250
### Describe the bug, including details regarding any error messages,
version, and platform.
Code example:
```
>>> import pyarrow
>>> pyarrow.RecordBatch.from_pylist([{"start": 0, "end": 1, "tag":
"foo"}]).schema
start: int64
end: int64
tag: string
>>> pyarrow.RecordBatch.from_pylist([{"spans": [{"start": 0, "end": 1,
"tag": "foo"}]}]).schema
spans: list<item: struct<end: int64, start: int64, tag: string>>
child 0, item: struct<end: int64, start: int64, tag: string>
child 0, end: int64
child 1, start: int64
child 2, tag: string
```
In the 1st schema, the fields of the `struct` are in the same order as the
keys in the input dictionary. In the 2nd schema, where the `struct` is nested
inside a `list`, the fields of the `struct` have been sorted by name. I would
expect the ordering to always be the same order as in the input (like in the
1st schema). The more general principle would be that the input -- or at least
the first row of input that's used for schema inference -- should validate
against the inferred schema.
I suspect this behavior is related to `from_pylist()` accepting lists where
the elements are dictionaries with different key sets, such as:
```
>>> pyarrow.RecordBatch.from_pylist([{"spans": [{"start": 0, "end": 1,
"tag": "foo"}, {"new": 42}]}]).schema
spans: list<item: struct<end: int64, new: int64, start: int64, tag: string>>
child 0, item: struct<end: int64, new: int64, start: int64, tag: string>
child 0, end: int64
child 1, new: int64
child 2, start: int64
child 3, tag: string
```
In this case I'm guessing it sorts the fields because it needs to come up
with a canonical ordering. It seems to me that it should just fail to infer a
schema here, since again the inputs are not valid according to the inferred
schema.
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]