Alex-PLACET opened a new issue, #49677: URL: https://github.com/apache/arrow/issues/49677
### Describe the enhancement requested Implement a kernel that replicates the semantics of NumPy’s [`searchsorted`](https://numpy.org/doc/stable/reference/generated/numpy.searchsorted.html) function (without the `sorter` argument). The kernel should support all primitive types as well as run-end encoded arrays. ## Example API ```python >>> sorted_array = pa.array() >>> to_search = pa.array() >>> pc.search_sorted(sorted_array, to_search, side='left') <pyarrow.lib.UInt64Array> [ 0, 1, 3, 5 ] >>> pc.search_sorted(sorted_array, to_search, side='right') <pyarrow.lib.UInt64Array> [ 0, 3, 3, 5 ] ``` ### Explanation - `50` < all → index `0` - `200` equals first occurrence → index `1` (`side='left'`) - `200` for `side='right'` → index `3` - `250` between `200` and `300` → index `3` - `400` > all values → index `5` ## Null Handling ### Nulls in the first (sorted) array If nulls are **clustered first**: ```python sorted_array = pa.array([null, 200, 300, 300]) to_search = pa.array() ``` Expected: - `side='left'` → `[1, 1, 2, 4]` - `side='right'` → `[1, 2, 2, 4]` If nulls are **clustered last**: ```python sorted_array = pa.array([200, 300, 300, null, null]) ``` Expected: - `side='left'` → `[0, 0, 1, 3]` - `side='right'` → `[0, 1, 1, 3]` ### Nulls in the second (to_search) array Two options: 1. Emit nulls for null search keys. 2. Match nulls within the null portion of the sorted array. Example for (2): ```python sorted_array = pa.array([null, null, 200]) # nulls first to_search = pa.array([null, 100, 300]) # side='left' → # side='right' → ``` ```python sorted_array = pa.array([200, null, null]) # nulls last to_search = pa.array([null, 100, 300]) # side='left' → # side='right' → ``` ## Requirements - Implement for all **primitive types** (`int*`, `float*`, `boolean`, etc.) - Support **run-end encoded arrays** - Handle both `side='left'` and `side='right'` - Define consistent behavior for **null placement** - Return a `UInt64Array` of insertion indices ## References - [NumPy `searchsorted` documentation](https://numpy.org/doc/stable/reference/generated/numpy.searchsorted.html) ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
