shubhamvishu commented on PR #14963: URL: https://github.com/apache/lucene/pull/14963#issuecomment-3089205957
@jpountz Ahh, I see what you are pointing towards and here is what think we could try maybe : - We currently also fallback to exact search after the visitedLimit is breached in HNSW search, so now that same visited limit would be applicable when we are iterating over the docs i.e. net-net `approximateKnn (visit V nodes) + exactSearch` ~== `exactSearch (visit V nodes linearly) + exactSearch` which I might not impact the search time?. So one way is to gulp this since we will visit small no. of docs but I agree we can further optimize this path (more on this below points) - We could completely remove the fallback to exactSearch in `AbstractKnnVectorQuery` and we could relax the check from - `if (knnCollector.earlyTerminated())` to - `if (knnCollector instanceof TimeLimitingKnnCollectorManager.TimeLimitingKnnCollector && ((TimeLimitingKnnCollectorManager.TimeLimitingKnnCollector)knnCollector).shouldExit())` after making `TimeLimitingKnnCollector` public and exposing `shouldExit()` This would ensure we continue the exact search `VectorsReader` and don't fallback to exactSearch in `AbstractKnnVectorQuery`. (we can do better maybe, more on it below) - Though I think `AbstractKnnVectorQuery#exactSearch` is better with exact search since it uses a conjunctive `DocIdSetIterator` rather than iterating on all the docs?. If yes, then for this we could maybe simply add an `else if` condition in VectorsReader to straightaway overwhelm the collector (forcing its `earlyTerminated` to return true) and return so it automatically fallsback to best exactSearch impl best of both worlds) ``` else if (getGraph(fieldEntry).equals(HnswGraph.EMPTY)) { // MakesFallback to exactSearch directly knnCollector.incVisitedCount((int) knnCollector.visitLimit() + 1); } ``` Let me know your thoughts or if I'm missing something here. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org