jtibshirani edited a comment on pull request #656:
URL: https://github.com/apache/lucene/pull/656#issuecomment-1032109021


   I tried out the around stopping the HNSW search early if it visits too many 
docs. To test, I modified `KnnGraphTester` to create `acceptDocs` uniformly at 
random with a certain selectivity, then measured recall and QPS. Here are the 
results on glove-100-angular (~1.2 million docs) with a filter selectivity 0.01:
   
   **Baseline**
   ```
   k        Recall    VisitedDocs     QPS  
   10        0.774       15957     232.083
   50        0.930       63429      58.994
   80        0.958       95175      42.470
   100       0.967      118891      35.203
   500       0.997     1176237       8.136
   800       0.999     1183514       5.571
   ```
   
   **PR**
   ```
   k        Recall    VisitedDocs     QPS  
   10        1.000             22908     190.286
   50        1.000             23607     152.224
   80        1.000             23608     148.036
   100       1.000             23608     145.381
   500       1.000             23608     138.903
   800       1.000             23608     137.882
   ```
   
   Since the filter is so selective, HNSW always visits more than 1% of the 
docs. The adaptive logic in the PR decides to stop the search and switch to an 
exact search, which bounds the visited docs at 2%. For `k=10` this makes the 
QPS a little worse, but overall prevents QPS from degrading (with the side 
benefit of perfect recall). I also tested with less restrictive filters, and in 
these cases the fallback just doesn't kick in, so the QPS remains the same as 
before.
   
   Overall I like this approach because it doesn't require us to fiddle with 
thresholds or expose new parameters. It could also help make HNSW more robust 
in "pathological" cases where even when the filter is not that selective, all 
the nearest vectors to a query happen to be filtered away.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to