Re: [PR] Add a DoubleValuesSource for scoring full precision vector similarity [lucene]

via GitHub Sun, 01 Jun 2025 00:14:17 -0700


vigyasharma commented on code in PR #14708:
URL: https://github.com/apache/lucene/pull/14708#discussion_r2118820224



##########
lucene/core/src/java/org/apache/lucene/search/ByteVectorSimilarityValuesSource.java:
##########
@@ -42,7 +78,35 @@ public VectorScorer getScorer(LeafReaderContext ctx) throws 
IOException {
       ByteVectorValues.checkField(ctx.reader(), fieldName);
       return null;
     }
-    return vectorValues.scorer(queryVector);
+    final FieldInfo fi = ctx.reader().getFieldInfos().fieldInfo(fieldName);
+    if (fi.getVectorDimension() != queryVector.length) {
+      throw new IllegalArgumentException(
+          "Query vector dimension does not match field dimension: "
+              + queryVector.length
+              + " != "
+              + fi.getVectorDimension());
+    }
+
+    // default vector scorer
+    if (useFullPrecision == false) {
+      return vectorValues.scorer(queryVector);
+    }
+
+    final VectorSimilarityFunction vectorSimilarityFunction = 
fi.getVectorSimilarityFunction();
+    return new VectorScorer() {
+      final KnnVectorValues.DocIndexIterator iterator = 
vectorValues.iterator();
+
+      @Override
+      public float score() throws IOException {
+        return vectorSimilarityFunction.compare(
+            queryVector, vectorValues.vectorValue(iterator.index()));

Review Comment:
   I do remember reading some results on the DiskANN issue where benchmarks 
indicated that having the vectors needed for ANN graph search in memory (the 
quantized vectors in this case), does lead to better performance. So maybe, an 
option to use only DIRECT_IO for this makes sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add a DoubleValuesSource for scoring full precision vector similarity [lucene]

Reply via email to