kaivalnp commented on code in PR #12857: URL: https://github.com/apache/lucene/pull/12857#discussion_r1412393818
########## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ########## @@ -118,13 +118,38 @@ private TopDocs getLeafResults(LeafReaderContext ctx, Weight filterWeight) throw return NO_RESULTS; } - BitSet acceptDocs = createBitSet(scorer.iterator(), liveDocs, maxDoc); - int cost = acceptDocs.cardinality(); + DocIdSetIterator iterator = scorer.iterator(); + BitSet bitSet = + iterator instanceof BitSetIterator + ? ((BitSetIterator) iterator).getBitSet() + : BitSet.of(iterator, maxDoc); + Bits acceptDocs = + new Bits() { + @Override + public boolean get(int index) { + return liveDocs != null ? liveDocs.get(index) & bitSet.get(index) : bitSet.get(index); + } + + @Override + public int length() { + return bitSet.cardinality(); + } Review Comment: `length()` is more like the [maximum doc](https://github.com/apache/lucene/blob/0e96b9cd8c1267acee02684bcac5be89d60b3bf4/lucene/core/src/java/org/apache/lucene/util/Bits.java#L73) you can request, this should be `bitSet.length()`? ########## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ########## @@ -118,13 +118,38 @@ private TopDocs getLeafResults(LeafReaderContext ctx, Weight filterWeight) throw return NO_RESULTS; } - BitSet acceptDocs = createBitSet(scorer.iterator(), liveDocs, maxDoc); - int cost = acceptDocs.cardinality(); + DocIdSetIterator iterator = scorer.iterator(); + BitSet bitSet = + iterator instanceof BitSetIterator + ? ((BitSetIterator) iterator).getBitSet() + : BitSet.of(iterator, maxDoc); + Bits acceptDocs = + new Bits() { + @Override + public boolean get(int index) { + return liveDocs != null ? liveDocs.get(index) & bitSet.get(index) : bitSet.get(index); Review Comment: nit: Can we make this cleaner by `return (liveDocs == null || liveDocs.get(index)) && bitSet.get(index)` ? ########## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ########## @@ -118,13 +118,38 @@ private TopDocs getLeafResults(LeafReaderContext ctx, Weight filterWeight) throw return NO_RESULTS; } - BitSet acceptDocs = createBitSet(scorer.iterator(), liveDocs, maxDoc); - int cost = acceptDocs.cardinality(); + DocIdSetIterator iterator = scorer.iterator(); + BitSet bitSet = + iterator instanceof BitSetIterator + ? ((BitSetIterator) iterator).getBitSet() + : BitSet.of(iterator, maxDoc); + Bits acceptDocs = + new Bits() { Review Comment: Perhaps we can wrap the `BitSet` in a new `Bits` only for the case we're trying to optimize (when `iterator instanceof BitSetIterator`) -- not changing the `BitSet.of` flow like @shubhamvishu also mentioned? ########## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ########## @@ -118,13 +118,38 @@ private TopDocs getLeafResults(LeafReaderContext ctx, Weight filterWeight) throw return NO_RESULTS; } - BitSet acceptDocs = createBitSet(scorer.iterator(), liveDocs, maxDoc); - int cost = acceptDocs.cardinality(); + DocIdSetIterator iterator = scorer.iterator(); + BitSet bitSet = + iterator instanceof BitSetIterator + ? ((BitSetIterator) iterator).getBitSet() + : BitSet.of(iterator, maxDoc); + Bits acceptDocs = + new Bits() { + @Override + public boolean get(int index) { + return liveDocs != null ? liveDocs.get(index) & bitSet.get(index) : bitSet.get(index); + } + + @Override + public int length() { + return bitSet.cardinality(); + } + }; + + FilteredDocIdSetIterator filteredDocIdSetIterator = + new FilteredDocIdSetIterator(new BitSetIterator(bitSet, maxDoc)) { + @Override + protected boolean match(int doc) { + return liveDocs == null || liveDocs.get(doc); + } + }; + + int cost = acceptDocs.length(); Review Comment: The `cost` here determines what limit to set for `#approximateSearch` If we use `acceptDocs.length()`, this will be equal to `maxDoc` (and always complete graph search without falling back to exact search, even when we want to..) Perhaps this should be `acceptDocs.cardinality()`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org