roosephu opened a new issue, #45686: URL: https://github.com/apache/arrow/issues/45686
### Describe the bug, including details regarding any error messages, version, and platform. To reproduce: ```python import pyarrow import numpy as np print(pyarrow.__version__) N = 2**30 // 4 data = { "id": [4, 3, 2, 1], "data": [np.zeros(N, dtype=np.int64) + i for i in range(4)], } table = pyarrow.Table.from_pydict(data) table2 = table.sort_by("id") print(table2) ``` Actual output: ``` pyarrow.__version__ = '19.0.1' pyarrow.Table id: int64 data: list<item: int64> child 0, item: int64 ---- id: [[1,2,3,4]] data: [[[1,1,1,1,1,...,1,1,1,1,1],[0,0,0,0,0,...,0,0,0,0,0],[1,1,1,1,1,...,1,1,1,1,1],[0,0,0,0,0,...,0,0,0,0,0]]] ``` Changing dtype to `np.int8` gives the correct output: ``` pyarrow.__version__ = '19.0.1' pyarrow.Table id: int64 data: list<item: int8> child 0, item: int8 ---- id: [[1,2,3,4]] data: [[[3,3,3,3,3,...,3,3,3,3,3],[2,2,2,2,2,...,2,2,2,2,2],[1,1,1,1,1,...,1,1,1,1,1],[0,0,0,0,0,...,0,0,0,0,0]]] ``` My guess is that it overflows when calculating offsets during sorting, although I have no idea how pyarrow works internally. ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org