thecoop commented on code in PR #14863: URL: https://github.com/apache/lucene/pull/14863#discussion_r2259683695
########## lucene/core/src/java24/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java: ########## @@ -530,7 +566,41 @@ private int dotProductBody512Int4Packed(byte[] unpacked, byte[] packed, int limi return sum; } - private int dotProductBody256Int4Packed(byte[] unpacked, byte[] packed, int limit) { + private static int dotProductBody512Int4PackedPacked( + ByteVectorLoader a, ByteVectorLoader b, int limit) { + int sum = 0; + // iterate in chunks of 1024 items to ensure we don't overflow the short accumulator + for (int i = 0; i < limit; i += 4096) { + ShortVector acc0 = ShortVector.zero(ShortVector.SPECIES_512); + ShortVector acc1 = ShortVector.zero(ShortVector.SPECIES_512); + int innerLimit = Math.min(limit - i, 4096); + for (int j = 0; j < innerLimit; j += ByteVector.SPECIES_256.length()) { + // packed + var vb8 = b.load(ByteVector.SPECIES_256, i + j); + // packed + var va8 = a.load(ByteVector.SPECIES_256, i + j); + + // upper + ByteVector prod8 = vb8.and((byte) 0x0F).mul(va8.and((byte) 0x0F)); + Vector<Short> prod16 = prod8.convertShape(ZERO_EXTEND_B2S, ShortVector.SPECIES_512, 0); Review Comment: Yeah, specifying it as a reinterpretation rather than shape-changing will be good. If the performance is unaffected, then using the more readable version is preferable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org