r-matejko opened a new issue, #47246: URL: https://github.com/apache/arrow/issues/47246
### Describe the bug, including details regarding any error messages, version, and platform. Numpy arrays seem to get converted with pyarrow.array(..) either to pyarrow.Array OR pyarrow.ChunkedArray depending on its' size according to the documentation. The pa.DictionaryArray.from_arrays(..) is also specified, that it can treat ndarrays, but in case the ndarray is a large object, there is an internal exception which is not able to treat the ChunkedArray object. ``` import pyarrow as pa import numpy as np # 300 MB binary blobs × 10 = 3 GB of total data blob = b'x' * (300 * 1024 * 1024) data = [blob] * 10 a = (np.array(data)) # test conversion to pyarrow array print(type(pa.array(a))) # -> pyarrow.lib.ChunkedArray indices = np.array(list(range(10))) # this throws an error, even if a is a legit numpy array pa.DictionaryArray.from_arrays(indices, a) ''' ------- RESULT ---------- Traceback (most recent call last): File "<input>", line 1, in <module> File "pyarrow\\array.pxi", line 4091, in pyarrow.lib.DictionaryArray.from_arrays TypeError: Cannot convert pyarrow.lib.ChunkedArray to pyarrow.lib.Array ''' `` ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
