laskoviymishka opened a new issue, #736:
URL: https://github.com/apache/arrow-go/issues/736

   ### Describe the enhancement requested
   
   Found this during research around 
https://github.com/apache/iceberg-go/pull/818#discussion_r3005436456
   The `is_in` kernel for binary types allocates once per input element because 
the `[]byte` value escapes to the heap through a closure chain. This makes 
`is_in` on binary arrays ~2x slower than a manual `map[string]struct{}` lookup.
   
   The call chain in the `is_in` hot path:
   
   1. `visitBinary` slices `rawBytes[offsets[pos]:offsets[pos+1]]` and passes 
it to `valid(slice)` callback
   2. The callback in `isInKernelExec` calls `state.Lookup.Exists(v)`
   3. `BinaryMemoTable.Exists` calls `b.lookup(hash, val)`
   4. `lookup` calls `b.tbl.Lookup(h, func(i int32) bool { return 
bytes.Equal(val, ...) })`
   
   The closure at step 4 captures `val []byte`. Since it's passed to 
`HashTable.Lookup` (which takes `cmp func(T) bool`), Go's escape analysis 
conservatively moves `val` to the heap. This propagates up — the slice created 
at step 1 also escapes.
   
   **Result:** 1 heap allocation per input element, regardless of whether the 
value exists in the set.
   
   ## Evidence
   
   Profiling `is_in` on a 1M-element Binary array with a 10-element value set:
   
   ```
   255203025 94.92%  BinaryMemoTable.Exists    (flat alloc_objects)
    10726303  3.99%  BinaryMemoTable.InsertOrGet
   ```
   
   ~1M allocs from `Exists` alone, matching the input size exactly.
   
   ## Validated Fix
   
   Using the `noescape` unsafe.Pointer trick (same pattern as Go runtime's 
`runtime/stubs.go`):
   
   ```go
   func noescape(p unsafe.Pointer) unsafe.Pointer {
       x := uintptr(p)
       return unsafe.Pointer(x ^ 0 ^ 0)
   }
   
   func noescapeBytes(val []byte) []byte {
       p := noescape(unsafe.Pointer(unsafe.SliceData(val)))
       return unsafe.Slice((*byte)(p), len(val))
   }
   ```
   
   Combined with `isInBinaryDirect` that iterates the binary array directly 
(bypassing `visitBinary`/`VisitBitBlocksShort` closure chain) and calls 
`ExistsDirect` which inlines the hash probe loop:
   
   **Before:** 1,000,231 allocs/op at 1M rows
   **After:**      225 allocs/op at 1M rows (4,445x reduction)
   
   Speed improved ~10% (51ms -> 35ms for 1M binary keys against 10-element 
value set).
   
   ## Suggested Fix Options
   
   1. **`noescape` trick** (validated, minimal change) — add `ExistsDirect` to 
`BinaryMemoTable` + `isInBinaryDirect` to the kernel
   2. **Restructure `HashTable.Lookup`** to avoid the `cmp func(T) bool` 
closure — e.g., add `LookupEqual(hash uint64, val T, eq func(a, b T) bool)` 
where the comparator is a package-level function (won't capture locals)
   3. **Change `BinaryMemoTable.Exists` signature** to accept `(buf []byte, 
offset, length int)` to avoid sub-slicing
   
   ## Reproduction
   
   Any `is_in` call with Binary type input will show ~1 alloc per element in 
benchmarks.
   
   
   ### Component(s)
   
   Benchmarking


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to