Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

via GitHub Fri, 12 Dec 2025 10:38:41 -0800


shubhamvishu commented on PR #13572:
URL: https://github.com/apache/lucene/pull/13572#issuecomment-3647679954


   Hi,
   
   Here at Amazon (customer facing product search), we’ve been testing this 
native dot product implementation in our production environment(ARM - Graviton 
2 and 3) and we see **5-14x** faster dot product computations in JMH benchmarks 
and we observed semantic latency improving from **62 msec** to **28 msec** 
(avg) for 4K embeddings(4.5 MM). Overall we saw **10-60%** improvement on 
end-end avg search latencies in different scenarios (different sized vectors, 
vector-focused search vs search combined with other workloads). We haven’t 
tested all other CPUs types yet. I'm working on a draft PR on top of this PR 
with following changes and planning to raise it soon :
   
   - Removing the overhead from heap to off-heap copying by utilizing 
`Linker.Option.critical`, which eliminates unnecessary copying 
   - Runtime dispatch using `IFUNC` to choose SVE vs NEON vs scalar 
implementation at runtime based on available intrinsics
   - Build related changes to generate the binary
   
   We kept the native code isolated in the misc package and not getting it in 
the core module which we know is highly discouraged. Additionally, PR #15285 
would later help eliminate some code duplication and enable a cleaner 
implementation similar to `PanamaVectorUtilSupport` - potentially through a 
`NativeVectorUtilSupport` class?
   
   Our benchmarking suggests substantial optimization potential for ARM-based 
deployments, and we believe this could benefit the broader Lucene community. We 
hope to make it easy for any Lucene user to opt-in to this alternative vector 
implementation ideally. We're committed to refining this implementation based 
on community feedback and addressing any concerns during the review process. 
I'm eager to hear the community's thoughts on this change, as there appears to 
be significant optimization potential for ARM architectures that could benefit 
many users. Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

Reply via email to