goankur commented on PR #13572: URL: https://github.com/apache/lucene/pull/13572#issuecomment-2235177271
> Do we even need to use intrinsics? function is so simple that the compiler seems to do the right thing, e.g. use `SDOT` dot production instruction, given the correct flags: > > https://godbolt.org/z/KG1dPnrqn > > https://developer.arm.com/documentation/102651/a/What-are-dot-product-intructions- > I haven't benchmarked, just seems `SDOT` is the one to optimize for, and GCC can both recognize the code shape and autovectorize to it without hassle. > > my cheap 2021 phone has `asimddp` feature in /proc/cpuinfo, dot product support seems widespread. > > You can use it directly via intrinsic, too, no need to use add/multiply intrinsic: https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#dot-product > > But unless it is really faster than what GCC does with simple C, no need. With the updated compile flags, the performance of auto-vectorized code is slightly better than explicitly vectorized code (see results). Interesting thing to note is that both C-based implementations have `10X` better throughout compared to the Panama API based java implementation (unless I am not doing apples-to-apples comparison). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org