rmuir commented on issue #12621: URL: https://github.com/apache/lucene/issues/12621#issuecomment-1747044386
As far as the ARM goes, the fact it has only 128-bit SIMD is the limiting factor. For e.g. AVX-256, we use 64-bit vector of 8 byte values -> 128 bit vector of 8 short values -> 256 bit vector of 8 int values. For ARM/NEON with only 128-bit, we can't do this as we don't have 256-bit vectors. So instead we use use 64-bit vector of 8 byte values -> 128 bit vector of 8 short values -> 2 128-bit vectors of 4 short values each. It requires splitting the vector in half, it is just all we can do. If you want it to be faster get an ARM with SVE SIMD which has bigger vectors than NEON. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org