Alex-PLACET opened a new issue, #49677:
URL: https://github.com/apache/arrow/issues/49677

   ### Describe the enhancement requested
   
   Implement a kernel that replicates the semantics of NumPy’s 
[`searchsorted`](https://numpy.org/doc/stable/reference/generated/numpy.searchsorted.html)
 function (without the `sorter` argument).  
   The kernel should support all primitive types as well as run-end encoded 
arrays.
   
   ## Example API
   
   ```python
   >>> sorted_array = pa.array()
   >>> to_search = pa.array()
   
   >>> pc.search_sorted(sorted_array, to_search, side='left')
   <pyarrow.lib.UInt64Array>
   [
     0,
     1,
     3,
     5
   ]
   
   >>> pc.search_sorted(sorted_array, to_search, side='right')
   <pyarrow.lib.UInt64Array>
   [
     0,
     3,
     3,
     5
   ]
   ```
   
   ### Explanation
   - `50` < all → index `0`
   - `200` equals first occurrence → index `1` (`side='left'`)
   - `200` for `side='right'` → index `3`
   - `250` between `200` and `300` → index `3`
   - `400` > all values → index `5`
   
   ## Null Handling
   
   ### Nulls in the first (sorted) array
   
   If nulls are **clustered first**:
   
   ```python
   sorted_array = pa.array([null, 200, 300, 300])
   to_search = pa.array()
   ```
   
   Expected:
   - `side='left'` → `[1, 1, 2, 4]`
   - `side='right'` → `[1, 2, 2, 4]`
   
   If nulls are **clustered last**:
   
   ```python
   sorted_array = pa.array([200, 300, 300, null, null])
   ```
   
   Expected:
   - `side='left'` → `[0, 0, 1, 3]`
   - `side='right'` → `[0, 1, 1, 3]`
   
   ### Nulls in the second (to_search) array
   
   Two options:
   1. Emit nulls for null search keys.
   2. Match nulls within the null portion of the sorted array.
   
   Example for (2):
   
   ```python
   sorted_array = pa.array([null, null, 200])   # nulls first
   to_search = pa.array([null, 100, 300])
   # side='left' → 
   # side='right' → 
   ```
   
   ```python
   sorted_array = pa.array([200, null, null])   # nulls last
   to_search = pa.array([null, 100, 300])
   # side='left'  → 
   # side='right' → 
   ```
   
   ## Requirements
   
   - Implement for all **primitive types** (`int*`, `float*`, `boolean`, etc.)
   - Support **run-end encoded arrays**
   - Handle both `side='left'` and `side='right'`
   - Define consistent behavior for **null placement**
   - Return a `UInt64Array` of insertion indices
   
   ## References
   
   - [NumPy `searchsorted` 
documentation](https://numpy.org/doc/stable/reference/generated/numpy.searchsorted.html)
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to