kaivalnp commented on code in PR #12922: URL: https://github.com/apache/lucene/pull/12922#discussion_r1424377622
########## lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java: ########## @@ -255,6 +255,11 @@ static VectorSimilarityScorer fromAcceptDocs( new FilteredDocIdSetIterator(acceptDocs) { @Override protected boolean match(int doc) throws IOException { + // Advance the scorer + if (!scorer.advanceExact(doc)) { + return false; + } + Review Comment: > This tells me that none of the filtered cases drop to using exact search? We do have a test case that covers this: [`testRandomFilter`](https://github.com/apache/lucene/blob/e18f9b1eb04a70d3fe5d8431a3d6724f187f050c/lucene/core/src/test/org/apache/lucene/search/BaseVectorSimilarityQueryTestCase.java#L158) randomly [filters a sub-range](https://github.com/apache/lucene/blob/e18f9b1eb04a70d3fe5d8431a3d6724f187f050c/lucene/core/src/test/org/apache/lucene/search/BaseVectorSimilarityQueryTestCase.java#L160-L161) of documents and [expects all of them to be found](https://github.com/apache/lucene/blob/e18f9b1eb04a70d3fe5d8431a3d6724f187f050c/lucene/core/src/test/org/apache/lucene/search/BaseVectorSimilarityQueryTestCase.java#L183-L184) -- which always falls back to exact search (because a `traversalSimilarity` of `Float.NEGATIVE_INFINITY` [here](https://github.com/apache/lucene/blob/e18f9b1eb04a70d3fe5d8431a3d6724f187f050c/lucene/core/src/test/org/apache/lucene/search/BaseVectorSimilarityQueryTestCase.java#L172) will visit the entire graph and exhaust any limit ) While running this test, I put debug points [here](https://github.com/apache/lucene/blob/e18f9b1eb04a70d3fe5d8431a3d6724f187f050c/lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java#L259) to check the `scorer`, and it was always unpositioned (`docid` = -1) but doesn't throw an error in *many* cases The error is thrown when the underlying [`values`](https://github.com/apache/lucene/blob/e18f9b1eb04a70d3fe5d8431a3d6724f187f050c/lucene/core/src/java/org/apache/lucene/search/VectorScorer.java#L96) are randomly chosen to `SimpleTextFloatVectorValues` - try `./gradlew test --tests TestFloatVectorSimilarityQuery.testRandomFilter -Dtests.seed=119135B1F0803918` for a failing case (as mentioned [here](https://github.com/apache/lucene/pull/12679#issuecomment-1851062374), thanks @epotyom!) In other cases (for example when `values` are `DenseOffHeapVectorValues`), the scorer *does not fail even when unpositioned* and returns some (garbage) value - but the count of results is still correct (and [asserted](https://github.com/apache/lucene/blob/e18f9b1eb04a70d3fe5d8431a3d6724f187f050c/lucene/core/src/test/org/apache/lucene/search/BaseVectorSimilarityQueryTestCase.java#L180-L184)) I wonder if we can put an `assert values.docID() != -1` before [this](https://github.com/apache/lucene/blob/1ac1b1cadc66364c8baca48e8334aa2855cd18b6/lucene/core/src/java/org/apache/lucene/search/VectorScorer.java#L90) and [this](https://github.com/apache/lucene/blob/1ac1b1cadc66364c8baca48e8334aa2855cd18b6/lucene/core/src/java/org/apache/lucene/search/VectorScorer.java#L120) line to ensure it is positioned before trying to compute scores? -- I tried this offline, and the issue was caught immediately -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org