laskoviymishka opened a new issue, #736: URL: https://github.com/apache/arrow-go/issues/736
### Describe the enhancement requested Found this during research around https://github.com/apache/iceberg-go/pull/818#discussion_r3005436456 The `is_in` kernel for binary types allocates once per input element because the `[]byte` value escapes to the heap through a closure chain. This makes `is_in` on binary arrays ~2x slower than a manual `map[string]struct{}` lookup. The call chain in the `is_in` hot path: 1. `visitBinary` slices `rawBytes[offsets[pos]:offsets[pos+1]]` and passes it to `valid(slice)` callback 2. The callback in `isInKernelExec` calls `state.Lookup.Exists(v)` 3. `BinaryMemoTable.Exists` calls `b.lookup(hash, val)` 4. `lookup` calls `b.tbl.Lookup(h, func(i int32) bool { return bytes.Equal(val, ...) })` The closure at step 4 captures `val []byte`. Since it's passed to `HashTable.Lookup` (which takes `cmp func(T) bool`), Go's escape analysis conservatively moves `val` to the heap. This propagates up — the slice created at step 1 also escapes. **Result:** 1 heap allocation per input element, regardless of whether the value exists in the set. ## Evidence Profiling `is_in` on a 1M-element Binary array with a 10-element value set: ``` 255203025 94.92% BinaryMemoTable.Exists (flat alloc_objects) 10726303 3.99% BinaryMemoTable.InsertOrGet ``` ~1M allocs from `Exists` alone, matching the input size exactly. ## Validated Fix Using the `noescape` unsafe.Pointer trick (same pattern as Go runtime's `runtime/stubs.go`): ```go func noescape(p unsafe.Pointer) unsafe.Pointer { x := uintptr(p) return unsafe.Pointer(x ^ 0 ^ 0) } func noescapeBytes(val []byte) []byte { p := noescape(unsafe.Pointer(unsafe.SliceData(val))) return unsafe.Slice((*byte)(p), len(val)) } ``` Combined with `isInBinaryDirect` that iterates the binary array directly (bypassing `visitBinary`/`VisitBitBlocksShort` closure chain) and calls `ExistsDirect` which inlines the hash probe loop: **Before:** 1,000,231 allocs/op at 1M rows **After:** 225 allocs/op at 1M rows (4,445x reduction) Speed improved ~10% (51ms -> 35ms for 1M binary keys against 10-element value set). ## Suggested Fix Options 1. **`noescape` trick** (validated, minimal change) — add `ExistsDirect` to `BinaryMemoTable` + `isInBinaryDirect` to the kernel 2. **Restructure `HashTable.Lookup`** to avoid the `cmp func(T) bool` closure — e.g., add `LookupEqual(hash uint64, val T, eq func(a, b T) bool)` where the comparator is a package-level function (won't capture locals) 3. **Change `BinaryMemoTable.Exists` signature** to accept `(buf []byte, offset, length int)` to avoid sub-slicing ## Reproduction Any `is_in` call with Binary type input will show ~1 alloc per element in benchmarks. ### Component(s) Benchmarking -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
