rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557149306

   > I don't have perf numbers any more - no idea whether this is better than 
what you have already - probably not, but it might be worth trying castShape?
   
   I'm using convertShape which is the same thing, only it allows a little more 
flexibility as instead of casting by Class, you can operators (e.g. zero 
extension). FYI Your code looks to have a correctness issue as it accumulates 
into `short` which is unsafe.
   
   See code: 
https://github.com/apache/lucene/pull/12311/commits/3a6cb81d092c240a7dc3938646186a9bfa021900
   Again my question is just how to do it generically with good performance, 
especially for machines with only 128 bit vectors :)
   
   With 256 bit vectors it is fast using ByteVector.SPECIES_64, 
ShortVector.SPECIES_128, and IntVector.SPECIES_256
   But for ARM which only has 128-bit vectors, the generic code using only 
"SPECIES_PREFERRED" isn't as fast as it should be: almost 2x but not 4x like on 
avx-256.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to