kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1812941899
> You still need to score the vectors to realize that they are in the iteration set or not Right, I meant that we need not score all *other* vectors to determine if the vector itself is a "hit" or not (we just need its similarity score to be above the `resultSimilarity`) - as opposed to KNN where it's not a simple "filter" like you mentioned > we do all this work in approximateSearch (because we need to score the values) only to throw it away I've tried to re-use some of this work to [directly reject](https://github.com/apache/lucene/blob/cad565439be512ac6e95a698007b1fc971173f00/lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java#L119-L121) vectors that are above the `traversalSimilarity` but below the `resultSimilarity` (the ones that were [already scored from HNSW search](https://github.com/apache/lucene/blob/cad565439be512ac6e95a698007b1fc971173f00/lucene/core/src/java/org/apache/lucene/search/VectorSimilarityCollector.java#L66-L68)), without re-computing their scores I wonder if we can extend this further: [`visited`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L236) marks all the nodes for which we have computed scores from HNSW search. However, anything that is "visited but not collected" will not make it to the final results. We can do this by passing the `visited` variable back to the `KnnCollector` by adding a new method like `setVisited(Bits)`? This is also usable in the current KNN-based search, wherever we fall back from `approximateSearch` to `exactSearch`. If the `KnnCollector` had information about whatever we have already scored in graph searches (but is not present in the results) -- we can prevent computing its similarity scores again from `exactSearch`, because we already know they are not present in the `topK` Right now we [score all vectors](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L177-L187) present in the `filter`, even if many of them are already scored and rejected in graph search [Here](https://github.com/apache/lucene/commit/2d6c0bfd4134b04c60be3864567211c824e7bc3c) are some very rough changes to support this -- what do you think @benwtrent? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org