kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1834591675

   Thanks @benwtrent! I also simplified the queries:
   
   I realized that the API may be difficult to use in the current state (we are 
leaving two parameters - `traversalSimilarity` and `visitLimit` upto the user 
to configure, which may be a large overhead)
   
   I noticed from above benchmarks that `traversalSimilarity` is good for 
tuning (acts like the `fanout` equivalent of `topK`) but most users need not 
change this -- and we can keep it equal to `resultSimilarity` by default (but 
still allow configuring it, whenever required)
   
   Another issue previously encountered (amplified by the above change) is that 
we stop graph search too early when the entry node is far away from the query. 
To overcome this, can we continue search as long as we find better scoring 
nodes (so we know there is a possibility of reaching nodes above 
`resultSimilarity`)?
   
   For configuring `visitLimit`, seems like the best option is to add a 
`filter` (like in `AbstractKnnVectorQuery`) - where we determine the 
`visitLimit` from the cost of the filter, and fall back to exact search over 
filtered docs - once this limit is reached..
   
   Here is the benchmark setup and results with these changes (same range as 
before): https://gist.github.com/kaivalnp/07d6a96d22adfad4d3cd5924b13ed524
   
   Also added some tests
   
   > I do worry a bit around the post-filtering. It seems likely in a 
restrictive search scenario, we would do a bunch of searching to no avail
   
   Agreed, we do some work in graph search (like similarity computations, 
collecting results, etc) - which should be reusable from exact search
   
   I had opened #12820 to discuss this issue (also affects KNN queries) - 
perhaps we can include these similarity-based queries if we arrive to a 
solution there?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to