jorisvandenbossche opened a new issue, #43511:
URL: https://github.com/apache/arrow/issues/43511

   For https://github.com/apache/arrow/issues/41665 (implemented for Array in 
https://github.com/apache/arrow/issues/42112 / 
https://github.com/apache/arrow/pull/42113), we currently use the following 
assertion to check if the data is on CPU (and thus supports the operation in 
question that access the data's address):
   
   
https://github.com/apache/arrow/blob/d4d92e4896d8108aef25c6ef199e87890d027b22/python/pyarrow/array.pxi#L2035-L2037
   
   This checks explicitly for the CPU device allocation type. 
   However, this means that for example data with a CUDA_HOST device type, 
which is actually accessible from the CPU, will trigger this error:
   
   ```python
   import numpy as np
   import pyarrow as pa
   from pyarrow import cuda
   
   # create Array with CudaHost buffer
   buf = cuda.new_host_buffer(5*8)
   np.frombuffer(buf, dtype=np.int64)[:] = range(5)
   arr = pa.Array.from_buffers(pa.int64(), size, [None, buf])
   
   # inspect the array
   >>> arr
   <pyarrow.lib.Int64Array object at 0x7f24b6e02e00>
   [
     0,
     1,
     2,
     3,
     4
   ]
   >>> arr.device_type
   <DeviceAllocationType.CUDA_HOST: 3>
   
   # calling a method that checks _assert_cpu errors
   >>> arr.sum()
   ...
   NotImplementedError: Implemented only for data on CPU device
   
   # but the underlying buffer itself "is_cpu"
   >>> arr.buffers()[1]
   <pyarrow.Buffer address=0x7f24c1600400 size=80 is_cpu=True is_mutable=True>
   >>> arr.buffers()[1].is_cpu
   True
   >>> arr.buffers()[1].device_type
   <DeviceAllocationType.CUDA_HOST: 3>
   ```
   
   At the buffer level we have this `is_cpu` attribute available, but currently 
on the Array level we only have `device_type()`. We could add CUDA_HOST device 
allocation type explicitly to the check above, but ideally we would use 
something more general?
   
   (cc @danepitkin)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to