rmuir commented on PR #13572: URL: https://github.com/apache/lucene/pull/13572#issuecomment-2230061569
I haven't benchmarked, just seems `SDOT` is the one to optimize for, and GCC can both recognize the code shape and autovectorize to it without hassle. my cheap 2021 phone has `asimddp` feature in /proc/cpuinfo, dot product support seems widespread. You can use it directly via intrinsic, too, no need to use add/multiply intrinsic: https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#dot-product But unless it is really faster than what GCC does with simple C, no need. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org