[GitHub] [lucene] msokolov commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery

GitBox Tue, 08 Feb 2022 11:51:07 -0800


msokolov commented on a change in pull request #656:
URL: https://github.com/apache/lucene/pull/656#discussion_r801992612




##########
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java
##########
@@ -253,6 +277,36 @@ public TopDocs search(String field, float[] target, int k, 
Bits acceptDocs) thro
         scoreDocs);
   }
 
+  private TopDocs exactSearch(
+      float[] target,
+      int k,
+      VectorSimilarityFunction similarityFunction,
+      VectorValues vectorValues,
+      DocIdSetIterator acceptIterator)
+      throws IOException {
+    HitQueue topK = new HitQueue(k, false);
+    int numVisited = 0;
+
+    int doc;
+    while ((doc = acceptIterator.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {

Review comment:
       should we call `advance(vectorValues.docID())` here to enable skipping?

##########
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java
##########
@@ -227,16 +231,36 @@ public TopDocs search(String field, float[] target, int 
k, Bits acceptDocs) thro
 
     // bound k by total number of vectors to prevent oversizing data structures
     k = Math.min(k, fieldEntry.size());
-
     OffHeapVectorValues vectorValues = getOffHeapVectorValues(fieldEntry);
+
+    DocIdSetIterator acceptIterator = null;
+    int visitedLimit = Integer.MAX_VALUE;
+
+    if (acceptDocs instanceof BitSet acceptBitSet) {

Review comment:
       I'm not super-familiar with other algorithms, but it does make sense to 
me that any approximate algorithm is going to have a "tuning" knob that 
increases recall in exchange for increased cost. This was the idea behind the 
now-defunct "fanout" parameter we had in the earlier version of the vector 
search API. So -- it makes sense to me that we are now bringing back some 
measure of control over this tuning, albeit in a different form.
   

##########
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java
##########
@@ -227,16 +231,36 @@ public TopDocs search(String field, float[] target, int 
k, Bits acceptDocs) thro
 
     // bound k by total number of vectors to prevent oversizing data structures
     k = Math.min(k, fieldEntry.size());
-
     OffHeapVectorValues vectorValues = getOffHeapVectorValues(fieldEntry);
+
+    DocIdSetIterator acceptIterator = null;
+    int visitedLimit = Integer.MAX_VALUE;
+
+    if (acceptDocs instanceof BitSet acceptBitSet) {

Review comment:
       @jpountz as always brings up interesting points! - I had no idea we were 
concerned about the number of subclasses of BitSet, nor was I aware of 
ExitableDirectoryReader! But I wonder if that should determine the approach 
here -- should we rely on Bits-based termination, or should we instrument 
`VectorValues`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] msokolov commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery

Reply via email to