kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1834591675
Thanks @benwtrent! I also simplified the queries: I realized that the API may be difficult to use in the current state (we are leaving two parameters - `traversalSimilarity` and `visitLimit` upto the user to configure, which may be a large overhead) I noticed from above benchmarks that `traversalSimilarity` is good for tuning (acts like the `fanout` equivalent of `topK`) but most users need not change this -- and we can keep it equal to `resultSimilarity` by default (but still allow configuring it, whenever required) Another issue previously encountered (amplified by the above change) is that we stop graph search too early when the entry node is far away from the query. To overcome this, can we continue search as long as we find better scoring nodes (so we know there is a possibility of reaching nodes above `resultSimilarity`)? For configuring `visitLimit`, seems like the best option is to add a `filter` (like in `AbstractKnnVectorQuery`) - where we determine the `visitLimit` from the cost of the filter, and fall back to exact search over filtered docs - once this limit is reached.. Here is the benchmark setup and results with these changes (same range as before): https://gist.github.com/kaivalnp/07d6a96d22adfad4d3cd5924b13ed524 Also added some tests > I do worry a bit around the post-filtering. It seems likely in a restrictive search scenario, we would do a bunch of searching to no avail Agreed, we do some work in graph search (like similarity computations, collecting results, etc) - which should be reusable from exact search I had opened #12820 to discuss this issue (also affects KNN queries) - perhaps we can include these similarity-based queries if we arrive to a solution there? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org