rmuir commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1557149306
> I don't have perf numbers any more - no idea whether this is better than what you have already - probably not, but it might be worth trying castShape? I'm using convertShape which is the same thing, only it allows a little more flexibility as instead of casting by Class, you can operators (e.g. zero extension). FYI Your code looks to have a correctness issue as it accumulates into `short` which is unsafe. See code: https://github.com/apache/lucene/pull/12311/commits/3a6cb81d092c240a7dc3938646186a9bfa021900 Again my question is just how to do it generically with good performance, especially for machines with only 128 bit vectors :) With 256 bit vectors it is fast using ByteVector.SPECIES_64, ShortVector.SPECIES_128, and IntVector.SPECIES_256 But for ARM which only has 128-bit vectors, the generic code using only "SPECIES_PREFERRED" isn't as fast as it should be: almost 2x but not 4x like on avx-256. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
