Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

via GitHub Wed, 17 Jul 2024 19:29:08 -0700


goankur commented on PR #13572:
URL: https://github.com/apache/lucene/pull/13572#issuecomment-2235177271


   > Do we even need to use intrinsics? function is so simple that the compiler 
seems to do the right thing, e.g. use `SDOT` dot production instruction, given 
the correct flags:
   > 
   > https://godbolt.org/z/KG1dPnrqn
   > 
   > 
https://developer.arm.com/documentation/102651/a/What-are-dot-product-intructions-
   
   
   
   > I haven't benchmarked, just seems `SDOT` is the one to optimize for, and 
GCC can both recognize the code shape and autovectorize to it without hassle.
   > 
   > my cheap 2021 phone has `asimddp` feature in /proc/cpuinfo, dot product 
support seems widespread.
   > 
   > You can use it directly via intrinsic, too, no need to use add/multiply 
intrinsic: 
https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#dot-product
   > 
   > But unless it is really faster than what GCC does with simple C, no need.
   
   With the updated compile flags, the performance of auto-vectorized code is 
slightly better than explicitly vectorized code (see results). Interesting 
thing to note is that both C-based implementations have `10X` better throughout 
compared to the Panama API based java implementation (unless I am not doing 
apples-to-apples comparison).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

Reply via email to