lllangWV opened a new issue, #45208:
URL: https://github.com/apache/arrow/issues/45208

   ### Describe the enhancement requested
   
   ### Enhancement Request: Custom Operator Support for PyArrow Extension Types 
in Compute Functions
   
   Hello, pyarrow devs!
   
   I have been using the PyArrow extension capability to define custom types, 
which is extremely useful for extending Arrow's functionality. However, a 
significant limitation arises when using these custom types with compute 
functions.
   
   For example, the `FixedShapeTensorType` type, designed as an extension type 
for `ndarrays`, triggers an error when used with the `pc.equal` function to 
compare arrays:
   
   #### Example Code
   ```python
   import pyarrow as pa
   import pyarrow.compute as pc
   
   tensor_type = pa.fixed_shape_tensor(pa.int32(), (2, 2))
   
   arr_1 = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
   storage_1 = pa.array(arr_1, pa.list_(pa.int32(), 4))
   tensor_array_1 = pa.ExtensionArray.from_storage(tensor_type, storage_1)
   
   arr_2 = [[1, 3, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
   storage_2 = pa.array(arr_2, pa.list_(pa.int32(), 4))
   tensor_array_2 = pa.ExtensionArray.from_storage(tensor_type, storage_2)
   
   # This triggers an error
   print(pc.equal(tensor_array_1, tensor_array_2))
   ```
   
   #### Error Message
   ```bash
     return func.call(args, None, memory_pool)
     File "pyarrow\\_compute.pyx", line 385, in pyarrow._compute.Function.call
     File "pyarrow\\error.pxi", line 155, in 
pyarrow.lib.pyarrow_internal_check_status
     File "pyarrow\\error.pxi", line 92, in pyarrow.lib.check_status
   pyarrow.lib.ArrowNotImplementedError: Function 'equal' has no kernel 
matching input types (extension<arrow.fixed_shape_tensor[value_type=int32, 
shape=[2,2]]>, extension<arrow.fixed_shape_tensor[value_type=int32, 
shape=[2,2]]>)
   ```
   
   ### Proposed Solution
   I believe it would be highly useful for PyArrow to allow users to define 
custom operator support for extension types, similar to how [Pandas enables 
operator support for 
`ExtensionArray`](https://pandas.pydata.org/pandas-docs/stable/development/extending.html#extensionarray-operator-support).
   
   #### Suggested Implementation
   Here’s an example for the interface:
   
   ```python
   class PythonObjectArrowType(pa.ExtensionType):
       def __init__(self):
           super().__init__(pa.binary(), "parquetdb.PythonObjectArrow")
   
       def __arrow_ext_serialize__(self):
           return b""
   
       @classmethod
       def __arrow_ext_deserialize__(cls, storage_type, serialized):
           return PythonObjectArrowType()
   
       def __arrow_ext_class__(self):
           return PythonObjectArrowArray
   
       def to_pandas_dtype(self):
           return PythonObjectPandasDtype()
   
       def __arrow_ext_scalar_class__(self):
           return PythonObjectArrowScalar
   
   
   pa.register_extension_type(PythonObjectArrowType())
   
   
   class PythonObjectArrowScalar(pa.ExtensionScalar):
       def as_py(self):
           return data_utils.load_python_object(self.value.as_py())
   
       def __eq__(self, other):
           return self.value == other.value
   
   
   class PythonObjectArrowArray(pa.ExtensionArray):
       def to_pandas(self, **kwargs):
           values = self.storage.to_numpy(zero_copy_only=False)
           results = mp_utils.parallel_apply(data_utils.load_python_object, 
values)
           return pd.Series(results)
   
       def to_values(self, **kwargs):
           values = self.storage.to_pandas(**kwargs).values
           results = mp_utils.parallel_apply(data_utils.load_python_object, 
values)
           return results
   ```
   
   In this example, the `PythonObjectArrowScalar` class defines an `__eq__` 
method, enabling custom equality comparisons for the scalar elements. 
Similarly, the `PythonObjectArrowArray` class can provide custom 
implementations for data conversion and manipulation.
   
   ### Challenges
   While defining `__eq__` in the scalar class is straightforward, I am 
uncertain how this would integrate into compute functions like `pc.equal`. It 
may require exposing additional hooks or mechanisms in PyArrow to allow users 
to register their operator implementations.
   
   Please let me know if additional details or examples are needed.
   
   Best, 
   
   Logan Lang
   
   ### Component(s)
   
   C++, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to