jtibshirani edited a comment on pull request #656: URL: https://github.com/apache/lucene/pull/656#issuecomment-1032109021
I tried out the around stopping the HNSW search early if it visits too many docs. To test, I modified `KnnGraphTester` to create `acceptDocs` uniformly at random with a certain selectivity, then measured recall and QPS. Here are the results on glove-100-angular (~1.2 million docs) with a filter selectivity 0.01: **Baseline** ``` k Recall VisitedDocs QPS 10 0.774 15957 232.083 50 0.930 63429 58.994 80 0.958 95175 42.470 100 0.967 118891 35.203 500 0.997 1176237 8.136 800 0.999 1183514 5.571 ``` **PR** ``` k Recall VisitedDocs QPS 10 1.000 22908 190.286 50 1.000 23607 152.224 80 1.000 23608 148.036 100 1.000 23608 145.381 500 1.000 23608 138.903 800 1.000 23608 137.882 ``` Since the filter is so selective, HNSW always visits more than 1% of the docs. The adaptive logic in the PR decides to stop the search and switch to an exact search, which bounds the visited docs at 2%. For `k=10` this makes the QPS a little worse, but overall prevents QPS from degrading (with the side benefit of perfect recall). I also tested with less restrictive filters, and in these cases the fallback just doesn't kick in, so the QPS remains the same as before. Overall I like this approach because it doesn't require us to fiddle with thresholds or expose new parameters. It could also help make HNSW more robust in "pathological" cases where even when the filter is not that selective, all the nearest vectors to a query happen to be filtered away. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org