msokolov commented on code in PR #13572: URL: https://github.com/apache/lucene/pull/13572#discussion_r1813272520
########## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/VectorUtilBenchmark.java: ########## @@ -84,6 +91,76 @@ public void init() { floatsA[i] = random.nextFloat(); floatsB[i] = random.nextFloat(); } + // Java 21+ specific initialization + final int runtimeVersion = Runtime.version().feature(); + if (runtimeVersion >= 21) { + // Reflection based code to eliminate the use of Preview classes in JMH benchmarks + try { + final Class<?> vectorUtilSupportClass = VectorUtil.getVectorUtilSupportClass(); + final var className = "org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport"; + if (vectorUtilSupportClass.getName().equals(className) == false) { + nativeBytesA = null; + nativeBytesB = null; + } else { + MethodHandles.Lookup lookup = MethodHandles.lookup(); + final var MemorySegment = "java.lang.foreign.MemorySegment"; + final var methodType = + MethodType.methodType(lookup.findClass(MemorySegment), byte[].class); + MethodHandle nativeMemorySegment = + lookup.findStatic(vectorUtilSupportClass, "nativeMemorySegment", methodType); + byte[] a = new byte[size]; Review Comment: I'm a little confused -- is this setup code for the benchmark? We just run it once and then run dot product on the same two vectors many times? I wonder if we would see something different if we generated a large number of vectors and randomized which ones we compare on each run. Also would performance vary if the vectors are sequential in their buffer (ie vector 0 starts at 0, vector 1 starts at size...) ########## lucene/core/src/java21/org/apache/lucene/internal/vectorization/Lucene99MemorySegmentByteVectorScorer.java: ########## @@ -34,6 +37,8 @@ abstract sealed class Lucene99MemorySegmentByteVectorScorer final MemorySegmentAccessInput input; final MemorySegment query; byte[] scratch; + MemorySegment offHeapScratch; Review Comment: I'm concerned about the cost of creating these for every scorer() we create because that happens a lot. During indexing, we create multiple scorers while adding each new document. Could we move these to the ScorerSupplier instead? ########## lucene/core/src/java21/org/apache/lucene/internal/vectorization/Lucene99MemorySegmentScalarQuantizedVectorScorer.java: ########## @@ -0,0 +1,407 @@ +/* Review Comment: can you say where this file came from? Was it mostly copied from some other file, or is it all brand new? ########## lucene/core/src/java21/org/apache/lucene/internal/vectorization/Lucene99MemorySegmentByteVectorScorer.java: ########## @@ -103,6 +125,27 @@ public float score(int node) throws IOException { } } + static final class NativeDotProductScorer extends Lucene99MemorySegmentByteVectorScorer { + + NativeDotProductScorer( + MemorySegmentAccessInput input, KnnVectorValues values, byte[] queryVector) { + super(input, values, queryVector); + if (offHeapQuery == null) { Review Comment: how would this ever not be null? can we assert offHeapQuery == null instead? Maybe we could make it final? ########## lucene/core/src/java21/org/apache/lucene/internal/vectorization/Lucene99MemorySegmentByteVectorScorer.java: ########## @@ -34,6 +37,8 @@ abstract sealed class Lucene99MemorySegmentByteVectorScorer final MemorySegmentAccessInput input; final MemorySegment query; byte[] scratch; + MemorySegment offHeapScratch; Review Comment: Hmm now I see we did do that! But then I wonder why we also need to do it here. ########## lucene/core/src/java/org/apache/lucene/codecs/lucene99/OffHeapQuantizedByteVectorValues.java: ########## @@ -146,6 +146,7 @@ public float getScoreCorrectionConstant(int targetOrd) throws IOException { } slice.seek(((long) targetOrd * byteSize) + numBytes); slice.readFloats(scoreCorrectionConstant, 0, 1); + lastOrd = targetOrd; Review Comment: this looks like a bug because if we do this and then call `vectorValue` we could get the wrong result because we don't calculate the binaryValue here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org