msokolov commented on a change in pull request #656: URL: https://github.com/apache/lucene/pull/656#discussion_r801992612
########## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java ########## @@ -253,6 +277,36 @@ public TopDocs search(String field, float[] target, int k, Bits acceptDocs) thro scoreDocs); } + private TopDocs exactSearch( + float[] target, + int k, + VectorSimilarityFunction similarityFunction, + VectorValues vectorValues, + DocIdSetIterator acceptIterator) + throws IOException { + HitQueue topK = new HitQueue(k, false); + int numVisited = 0; + + int doc; + while ((doc = acceptIterator.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { Review comment: should we call `advance(vectorValues.docID())` here to enable skipping? ########## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java ########## @@ -227,16 +231,36 @@ public TopDocs search(String field, float[] target, int k, Bits acceptDocs) thro // bound k by total number of vectors to prevent oversizing data structures k = Math.min(k, fieldEntry.size()); - OffHeapVectorValues vectorValues = getOffHeapVectorValues(fieldEntry); + + DocIdSetIterator acceptIterator = null; + int visitedLimit = Integer.MAX_VALUE; + + if (acceptDocs instanceof BitSet acceptBitSet) { Review comment: I'm not super-familiar with other algorithms, but it does make sense to me that any approximate algorithm is going to have a "tuning" knob that increases recall in exchange for increased cost. This was the idea behind the now-defunct "fanout" parameter we had in the earlier version of the vector search API. So -- it makes sense to me that we are now bringing back some measure of control over this tuning, albeit in a different form. ########## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java ########## @@ -227,16 +231,36 @@ public TopDocs search(String field, float[] target, int k, Bits acceptDocs) thro // bound k by total number of vectors to prevent oversizing data structures k = Math.min(k, fieldEntry.size()); - OffHeapVectorValues vectorValues = getOffHeapVectorValues(fieldEntry); + + DocIdSetIterator acceptIterator = null; + int visitedLimit = Integer.MAX_VALUE; + + if (acceptDocs instanceof BitSet acceptBitSet) { Review comment: @jpountz as always brings up interesting points! - I had no idea we were concerned about the number of subclasses of BitSet, nor was I aware of ExitableDirectoryReader! But I wonder if that should determine the approach here -- should we rely on Bits-based termination, or should we instrument `VectorValues`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org