msokolov commented on PR #14874: URL: https://github.com/apache/lucene/pull/14874#issuecomment-3058165787
what I did: ``` @@ -305,7 +306,36 @@ final class PanamaVectorUtilSupport implements VectorUtilSupport { @Override public int dotProduct(byte[] a, byte[] b) { - return dotProduct(MemorySegment.ofArray(a), MemorySegment.ofArray(b)); + return dotProductWTF(MemorySegment.ofArray(a), MemorySegment.ofArray(b)); + } + + static int dotProductWTF(MemorySegment a, MemorySegment b) { + assert a.byteSize() == b.byteSize(); + int i = 0; + int res = 0; + + // only vectorize if we'll at least enter the loop a single time, and we have at least 128-bit + // vectors (256-bit on intel to dodge performance landmines) + if (a.byteSize() >= 16 && PanamaVectorConstants.HAS_FAST_INTEGER_VECTORS) { + // compute vectorized dot product consistent with VPDPBUSD instruction + if (VECTOR_BITSIZE >= 512) { + i += BYTE_SPECIES.loopBound(a.byteSize()); + res += dotProductBody512(a, b, i); + } else if (VECTOR_BITSIZE == 256) { + i += BYTE_SPECIES.loopBound(a.byteSize()); + res += dotProductBody256(a, b, i); + } else { + // tricky: we don't have SPECIES_32, so we workaround with "overlapping read" + i += ByteVector.SPECIES_64.loopBound(a.byteSize() - ByteVector.SPECIES_64.length()); + res += dotProductBody128(a, b, i); + } + } + + // scalar tail + for (; i < a.byteSize(); i++) { + res += b.get(JAVA_BYTE, i) * a.get(JAVA_BYTE, i); + } + return res; } ``` Where `dotProductWTF` is just a clone of `dotProduct` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org