kaivalnp commented on code in PR #12922:
URL: https://github.com/apache/lucene/pull/12922#discussion_r1424377622


##########
lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java:
##########
@@ -255,6 +255,11 @@ static VectorSimilarityScorer fromAcceptDocs(
           new FilteredDocIdSetIterator(acceptDocs) {
             @Override
             protected boolean match(int doc) throws IOException {
+              // Advance the scorer
+              if (!scorer.advanceExact(doc)) {
+                return false;
+              }
+

Review Comment:
   > This tells me that none of the filtered cases drop to using exact search?
   
   We do have a test case that covers this: 
[`testRandomFilter`](https://github.com/apache/lucene/blob/e18f9b1eb04a70d3fe5d8431a3d6724f187f050c/lucene/core/src/test/org/apache/lucene/search/BaseVectorSimilarityQueryTestCase.java#L158)
 randomly [filters a 
sub-range](https://github.com/apache/lucene/blob/e18f9b1eb04a70d3fe5d8431a3d6724f187f050c/lucene/core/src/test/org/apache/lucene/search/BaseVectorSimilarityQueryTestCase.java#L160-L161)
 of documents and [expects all of them to be 
found](https://github.com/apache/lucene/blob/e18f9b1eb04a70d3fe5d8431a3d6724f187f050c/lucene/core/src/test/org/apache/lucene/search/BaseVectorSimilarityQueryTestCase.java#L183-L184)
 -- which always falls back to exact search (because a `traversalSimilarity` of 
`Float.NEGATIVE_INFINITY` 
[here](https://github.com/apache/lucene/blob/e18f9b1eb04a70d3fe5d8431a3d6724f187f050c/lucene/core/src/test/org/apache/lucene/search/BaseVectorSimilarityQueryTestCase.java#L172)
 will visit the entire graph and exhaust any limit
 )
   
   While running this test, I put debug points 
[here](https://github.com/apache/lucene/blob/e18f9b1eb04a70d3fe5d8431a3d6724f187f050c/lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java#L259)
 to check the `scorer`, and it was always unpositioned (`docid` = -1) but 
doesn't throw an error in *many* cases
   
   The error is thrown when the underlying 
[`values`](https://github.com/apache/lucene/blob/e18f9b1eb04a70d3fe5d8431a3d6724f187f050c/lucene/core/src/java/org/apache/lucene/search/VectorScorer.java#L96)
 are randomly chosen to `SimpleTextFloatVectorValues` - try `./gradlew test 
--tests TestFloatVectorSimilarityQuery.testRandomFilter 
-Dtests.seed=119135B1F0803918` for a failing case (as mentioned 
[here](https://github.com/apache/lucene/pull/12679#issuecomment-1851062374), 
thanks @epotyom!)
   
   In other cases (for example when `values` are `DenseOffHeapVectorValues`), 
the scorer *does not fail even when unpositioned* and returns some (garbage) 
value - but the count of results is still correct (and 
[asserted](https://github.com/apache/lucene/blob/e18f9b1eb04a70d3fe5d8431a3d6724f187f050c/lucene/core/src/test/org/apache/lucene/search/BaseVectorSimilarityQueryTestCase.java#L180-L184))
   
   I wonder if we can put an `assert values.docID() != -1` before 
[this](https://github.com/apache/lucene/blob/1ac1b1cadc66364c8baca48e8334aa2855cd18b6/lucene/core/src/java/org/apache/lucene/search/VectorScorer.java#L90)
 and 
[this](https://github.com/apache/lucene/blob/1ac1b1cadc66364c8baca48e8334aa2855cd18b6/lucene/core/src/java/org/apache/lucene/search/VectorScorer.java#L120)
 line to ensure it is positioned before trying to compute scores? -- I tried 
this offline, and the issue was caught immediately



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to