kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1768862106
> the Collector is full by flagging "incomplete" (I think this is possible) once a threshold is reached Do you mean that we return incomplete results? Instead, maybe we can: 1. Ask for a sane limit on the number of nodes to visit from the user 2. If this limit is reached (possibly when the supplied `traversalThreshold` is too low), then we break out of HNSW search 3. Now instead of performing a [greedy `#exactSearch`](https://github.com/kaivalnp/lucene/blob/radius-based-vector-search/lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java#L53-L74) and collecting everything into a list, we return a `TwoPhaseIterator` where the [`#matches`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/TwoPhaseIterator.java#L112) call performs the underlying dot product comparison and returns `true` or `false` based on whether the computed score is above the `resultThreshold` 4. This way, we can perform an "exact search" lazily, and only compute vector similarity on required documents (for example: if this query is a child of some `BooleanQuery`, then the actual number of documents for which we'll need to compute similarity is greatly reduced). The worst case will still be an exact search on all documents This "lazy-loading" works very well for our use case because the fact that a vector matches our query or not is independent of other vectors (unlike in K-NN, where given a query and an arbitrary doc vector, we cannot say whether the doc vector will be in the `topK` results of the query) Is this what you had in mind earlier @jpountz? > I will try and replicate with Lucene Util. Yes, I took inspiration from [`KnnGraphTester`](https://github.com/mikemccand/luceneutil/blob/master/src/main/KnnGraphTester.java) to write a local benchmark, but may have made some silly mistakes. It'll be good to get an independent set of benchmark results.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org