Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

via GitHub Wed, 23 Oct 2024 12:07:39 -0700


msokolov commented on code in PR #13572:
URL: https://github.com/apache/lucene/pull/13572#discussion_r1813272520



##########
lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/VectorUtilBenchmark.java:
##########
@@ -84,6 +91,76 @@ public void init() {
       floatsA[i] = random.nextFloat();
       floatsB[i] = random.nextFloat();
     }
+    // Java 21+ specific initialization
+    final int runtimeVersion = Runtime.version().feature();
+    if (runtimeVersion >= 21) {
+      // Reflection based code to eliminate the use of Preview classes in JMH 
benchmarks
+      try {
+        final Class<?> vectorUtilSupportClass = 
VectorUtil.getVectorUtilSupportClass();
+        final var className = 
"org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport";
+        if (vectorUtilSupportClass.getName().equals(className) == false) {
+          nativeBytesA = null;
+          nativeBytesB = null;
+        } else {
+          MethodHandles.Lookup lookup = MethodHandles.lookup();
+          final var MemorySegment = "java.lang.foreign.MemorySegment";
+          final var methodType =
+              MethodType.methodType(lookup.findClass(MemorySegment), 
byte[].class);
+          MethodHandle nativeMemorySegment =
+              lookup.findStatic(vectorUtilSupportClass, "nativeMemorySegment", 
methodType);
+          byte[] a = new byte[size];

Review Comment:
   I'm a little confused -- is this setup code for the benchmark? We just run 
it once and then run dot product on the same two vectors many times?  I wonder 
if we would see something different if we generated a large number of vectors 
and randomized which ones we compare on each run.  Also would performance vary 
if the vectors are sequential in their buffer (ie vector 0 starts at 0, vector 
1 starts at size...)



##########
lucene/core/src/java21/org/apache/lucene/internal/vectorization/Lucene99MemorySegmentByteVectorScorer.java:
##########
@@ -34,6 +37,8 @@ abstract sealed class Lucene99MemorySegmentByteVectorScorer
   final MemorySegmentAccessInput input;
   final MemorySegment query;
   byte[] scratch;
+  MemorySegment offHeapScratch;

Review Comment:
   I'm concerned about the cost of creating these for every scorer() we create 
because that happens a lot. During indexing, we create multiple scorers while 
adding each new document. Could we move these to the ScorerSupplier instead?



##########
lucene/core/src/java21/org/apache/lucene/internal/vectorization/Lucene99MemorySegmentScalarQuantizedVectorScorer.java:
##########
@@ -0,0 +1,407 @@
+/*

Review Comment:
   can you say where this file came from? Was it mostly copied from some other 
file, or is it all brand new?



##########
lucene/core/src/java21/org/apache/lucene/internal/vectorization/Lucene99MemorySegmentByteVectorScorer.java:
##########
@@ -103,6 +125,27 @@ public float score(int node) throws IOException {
     }
   }
 
+  static final class NativeDotProductScorer extends 
Lucene99MemorySegmentByteVectorScorer {
+
+    NativeDotProductScorer(
+        MemorySegmentAccessInput input, KnnVectorValues values, byte[] 
queryVector) {
+      super(input, values, queryVector);
+      if (offHeapQuery == null) {

Review Comment:
   how would this ever not be null? can we assert offHeapQuery == null instead? 
Maybe we could make it final?



##########
lucene/core/src/java21/org/apache/lucene/internal/vectorization/Lucene99MemorySegmentByteVectorScorer.java:
##########
@@ -34,6 +37,8 @@ abstract sealed class Lucene99MemorySegmentByteVectorScorer
   final MemorySegmentAccessInput input;
   final MemorySegment query;
   byte[] scratch;
+  MemorySegment offHeapScratch;

Review Comment:
   Hmm now I see we did do that! But then I wonder why we also need to do it 
here.



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene99/OffHeapQuantizedByteVectorValues.java:
##########
@@ -146,6 +146,7 @@ public float getScoreCorrectionConstant(int targetOrd) 
throws IOException {
     }
     slice.seek(((long) targetOrd * byteSize) + numBytes);
     slice.readFloats(scoreCorrectionConstant, 0, 1);
+    lastOrd = targetOrd;

Review Comment:
   this looks like a bug because if we do this and then call `vectorValue` we 
could get the wrong result because we don't calculate the binaryValue here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

Reply via email to