r-matejko opened a new issue, #47246:
URL: https://github.com/apache/arrow/issues/47246

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Numpy arrays seem to get converted with pyarrow.array(..) either to 
pyarrow.Array OR pyarrow.ChunkedArray depending on its' size according to the 
documentation.
   
   The pa.DictionaryArray.from_arrays(..) is also specified, that it can treat 
ndarrays, but in case the ndarray is a large object, there is an internal 
exception which is not able to treat the ChunkedArray object.
   
   ```
   import pyarrow as pa
   import numpy as np
   
   # 300 MB binary blobs × 10 = 3 GB of total data
   blob = b'x' * (300 * 1024 * 1024)
   data = [blob] * 10
   a = (np.array(data))
   
   # test conversion to pyarrow array
   print(type(pa.array(a))) # -> pyarrow.lib.ChunkedArray
   
   indices = np.array(list(range(10)))
   
   # this throws an error, even if a is a legit numpy array
   pa.DictionaryArray.from_arrays(indices, a)
   
   '''
   ------- RESULT ----------
   Traceback (most recent call last):
   File "<input>", line 1, in <module>
   File "pyarrow\\array.pxi", line 4091, in 
pyarrow.lib.DictionaryArray.from_arrays
   TypeError: Cannot convert pyarrow.lib.ChunkedArray to pyarrow.lib.Array
   '''
   ``
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to