rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1561601759
I am not sure how implemented SVE is, but would love to try it out since we don't have the 128-bit vector restriction. on the ARM with this branch, the floating point algorithms are fast, but the integer algorithms are slowish (only 2x speedup). it is not because of the ARM, it is because of the vector API. I saw the same results with 128-bit vectors on intel too. processing vector in "parts" is very slow (e.g. split bytevector into two shortvectors). It is so slow that it is actually faster to just ignore the second "part", and only process half of the bytevector in each loop iteration, doing overlapping reads and re-reading the second part in the next! (No i don't want to change the code to do this). I still wanted to try just "casting" byte to short my own self without using the jdk type conversion support. just means applying a shuffle and AND'ing with a mask. ugly as hell but probably speeds up that 128-bit stuff on ARM. have not tried yet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org