kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1812941899

   > You still need to score the vectors to realize that they are in the 
iteration set or not
   
   Right, I meant that we need not score all *other* vectors to determine if 
the vector itself is a "hit" or not (we just need its similarity score to be 
above the `resultSimilarity`) - as opposed to KNN where it's not a simple 
"filter" like you mentioned
   
   > we do all this work in approximateSearch (because we need to score the 
values) only to throw it away
   
   I've tried to re-use some of this work to [directly 
reject](https://github.com/apache/lucene/blob/cad565439be512ac6e95a698007b1fc971173f00/lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java#L119-L121)
 vectors that are above the `traversalSimilarity` but below the 
`resultSimilarity` (the ones that were [already scored from HNSW 
search](https://github.com/apache/lucene/blob/cad565439be512ac6e95a698007b1fc971173f00/lucene/core/src/java/org/apache/lucene/search/VectorSimilarityCollector.java#L66-L68)),
 without re-computing their scores
   
   I wonder if we can extend this further: 
[`visited`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L236)
 marks all the nodes for which we have computed scores from HNSW search. 
However, anything that is "visited but not collected" will not make it to the 
final results. We can do this by passing the `visited` variable back to the 
`KnnCollector` by adding a new method like `setVisited(Bits)`?
   
   This is also usable in the current KNN-based search, wherever we fall back 
from `approximateSearch` to `exactSearch`. If the `KnnCollector` had 
information about whatever we have already scored in graph searches (but is not 
present in the results) -- we can prevent computing its similarity scores again 
from `exactSearch`, because we already know they are not present in the `topK`
   
   Right now we [score all 
vectors](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L177-L187)
 present in the `filter`, even if many of them are already scored and rejected 
in graph search
   
   
[Here](https://github.com/apache/lucene/commit/2d6c0bfd4134b04c60be3864567211c824e7bc3c)
 are some very rough changes to support this -- what do you think @benwtrent?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to