rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1561601759

   I am not sure how implemented SVE is, but would love to try it out since we 
don't have the 128-bit vector restriction.
   
   on the ARM with this branch, the floating point algorithms are fast, but the 
integer algorithms are slowish (only 2x speedup). it is not because of the ARM, 
it is because of the vector API. I saw the same results with 128-bit vectors on 
intel too.
   
   processing vector in "parts" is very slow (e.g. split bytevector into two 
shortvectors). It is so slow that it is actually faster to just ignore the 
second "part", and only process half of the bytevector in each loop iteration, 
doing overlapping reads and re-reading the second part in the next! (No i don't 
want to change the code to do this).
   
   I still wanted to try just "casting" byte to short my own self without using 
the jdk type conversion support. just means applying a shuffle and AND'ing with 
a mask. ugly as hell but probably speeds up that 128-bit stuff on ARM. have not 
tried yet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to