lllangWV opened a new issue, #45208: URL: https://github.com/apache/arrow/issues/45208
### Describe the enhancement requested ### Enhancement Request: Custom Operator Support for PyArrow Extension Types in Compute Functions Hello, pyarrow devs! I have been using the PyArrow extension capability to define custom types, which is extremely useful for extending Arrow's functionality. However, a significant limitation arises when using these custom types with compute functions. For example, the `FixedShapeTensorType` type, designed as an extension type for `ndarrays`, triggers an error when used with the `pc.equal` function to compare arrays: #### Example Code ```python import pyarrow as pa import pyarrow.compute as pc tensor_type = pa.fixed_shape_tensor(pa.int32(), (2, 2)) arr_1 = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]] storage_1 = pa.array(arr_1, pa.list_(pa.int32(), 4)) tensor_array_1 = pa.ExtensionArray.from_storage(tensor_type, storage_1) arr_2 = [[1, 3, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]] storage_2 = pa.array(arr_2, pa.list_(pa.int32(), 4)) tensor_array_2 = pa.ExtensionArray.from_storage(tensor_type, storage_2) # This triggers an error print(pc.equal(tensor_array_1, tensor_array_2)) ``` #### Error Message ```bash return func.call(args, None, memory_pool) File "pyarrow\\_compute.pyx", line 385, in pyarrow._compute.Function.call File "pyarrow\\error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow\\error.pxi", line 92, in pyarrow.lib.check_status pyarrow.lib.ArrowNotImplementedError: Function 'equal' has no kernel matching input types (extension<arrow.fixed_shape_tensor[value_type=int32, shape=[2,2]]>, extension<arrow.fixed_shape_tensor[value_type=int32, shape=[2,2]]>) ``` ### Proposed Solution I believe it would be highly useful for PyArrow to allow users to define custom operator support for extension types, similar to how [Pandas enables operator support for `ExtensionArray`](https://pandas.pydata.org/pandas-docs/stable/development/extending.html#extensionarray-operator-support). #### Suggested Implementation Here’s an example for the interface: ```python class PythonObjectArrowType(pa.ExtensionType): def __init__(self): super().__init__(pa.binary(), "parquetdb.PythonObjectArrow") def __arrow_ext_serialize__(self): return b"" @classmethod def __arrow_ext_deserialize__(cls, storage_type, serialized): return PythonObjectArrowType() def __arrow_ext_class__(self): return PythonObjectArrowArray def to_pandas_dtype(self): return PythonObjectPandasDtype() def __arrow_ext_scalar_class__(self): return PythonObjectArrowScalar pa.register_extension_type(PythonObjectArrowType()) class PythonObjectArrowScalar(pa.ExtensionScalar): def as_py(self): return data_utils.load_python_object(self.value.as_py()) def __eq__(self, other): return self.value == other.value class PythonObjectArrowArray(pa.ExtensionArray): def to_pandas(self, **kwargs): values = self.storage.to_numpy(zero_copy_only=False) results = mp_utils.parallel_apply(data_utils.load_python_object, values) return pd.Series(results) def to_values(self, **kwargs): values = self.storage.to_pandas(**kwargs).values results = mp_utils.parallel_apply(data_utils.load_python_object, values) return results ``` In this example, the `PythonObjectArrowScalar` class defines an `__eq__` method, enabling custom equality comparisons for the scalar elements. Similarly, the `PythonObjectArrowArray` class can provide custom implementations for data conversion and manipulation. ### Challenges While defining `__eq__` in the scalar class is straightforward, I am uncertain how this would integrate into compute functions like `pc.equal`. It may require exposing additional hooks or mechanisms in PyArrow to allow users to register their operator implementations. Please let me know if additional details or examples are needed. Best, Logan Lang ### Component(s) C++, Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org