alanhdu opened a new issue, #44340: URL: https://github.com/apache/arrow/issues/44340
### Describe the bug, including details regarding any error messages, version, and platform. I have a table with lots of strings that I would like to export to Pandas. The following code can recreate the error: ```python import numpy as np import pyarrow as pa SIZE = 1024 N = 2 * 1024 * 1024 buffer = np.random.bytes(N * SIZE) table = pa.Table.from_pydict({ "row": [buffer[i * SIZE: (i + 1) * SIZE] for i in range(N)] }) df = table.to_pandas(strings_to_categorical=True) ``` This is currently failing with the error: ``` Traceback (most recent call last): File "/home/alandu/workspace/scratch/repro.py", line 13, in <module> df = table.to_pandas(strings_to_categorical=True) File "pyarrow/array.pxi", line 885, in pyarrow.lib._PandasConvertible.to_pandas File "pyarrow/table.pxi", line 5002, in pyarrow.lib.Table._to_pandas File "/home/alandu/micromamba/envs/test/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 784, in table_to_dataframe result = pa.lib.table_to_blocks(options, table, categories, File "pyarrow/table.pxi", line 3941, in pyarrow.lib.table_to_blocks File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status pyarrow.lib.ArrowCapacityError: array cannot contain more than 2147483646 bytes, have 2147483648 ``` This is using Python 3.10 on PyArrow 17.0 on Linux (installed via conda-forge). This *only* seems to happen when I set `strings_to_categorical=True` -- if that is `False`, then I can export this to a dataframe without issues. ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org