benwtrent commented on code in PR #12806: URL: https://github.com/apache/lucene/pull/12806#discussion_r1393164542
########## lucene/core/src/test/org/apache/lucene/search/BaseKnnVectorQueryTestCase.java: ########## @@ -779,6 +781,16 @@ Directory getIndexStore( doc.add(getKnnVectorField(field, contents[i], vectorSimilarityFunction)); doc.add(new StringField("id", "id" + i, Field.Store.YES)); writer.addDocument(doc); + if (randomBoolean()) { Review Comment: Adding docs in between vector docs to ensure our sparse vector reader is adequately exercised. Simple improvement in coverage here. ########## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ########## @@ -200,6 +206,12 @@ protected TopDocs exactSearch(LeafReaderContext context, DocIdSetIterator accept return new TopDocs(totalHits, topScoreDocs); } + /** + * @param ctx the leaf reader context + * @return the number of vectors in the given leaf. + */ + protected abstract int numVectorsInLeaf(LeafReaderContext ctx) throws IOException; Review Comment: This is cheap, we store it per field. This way we have a good count even when not every document has a vector. ########## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ########## @@ -110,6 +110,12 @@ private TopDocs getLeafResults(LeafReaderContext ctx, Weight filterWeight) throw int maxDoc = ctx.reader().maxDoc(); if (filterWeight == null) { + int cost = liveDocs == null ? maxDoc : liveDocs.length(); Review Comment: I am fairly ignorant around what `liveDocs.length()` means. Here I am assuming if there are `liveDocs`, this iterator might actually be smaller than `maxDoc`. ########## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ########## @@ -171,13 +177,13 @@ protected TopDocs exactSearch(LeafReaderContext context, DocIdSetIterator accept } VectorScorer vectorScorer = createVectorScorer(context, fi); + // Iterate over the vector docs that match the filter and find the top k + DocIdSetIterator iterator = + ConjunctionUtils.intersectIterators(List.of(acceptIterator, vectorScorer.iterator())); Review Comment: An intersection of the vector values and the user provided filter seemed like the way it should have always been. Do we want to remove the `fieldExists` query logic when a user provides a pre-filter? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org