kaivalnp opened a new pull request, #15784:
URL: https://github.com/apache/lucene/pull/15784

   ### Description
   
   Lucene added support for similarity-based vector searches in #12679, which 
is a vector query with a goal of introducing _all_ results above a vector 
similarity score threshold (= `resultSimilarity`) to the query vector (as 
opposed to a KNN query, with a goal of introducing the `topK` highest scoring 
results to the query).
   
   `[Byte|Float]VectorSimilarityQuery` provides an approximate search for 
this^, which uses a [special 
collector](https://github.com/apache/lucene/blob/f021aa55853c8b446404c8616ec247027774ae07/lucene/core/src/java/org/apache/lucene/search/VectorSimilarityCollector.java#L27)
 to traverse and collect results from existing HNSW graphs.
   
   The search algorithm in upper levels of the HNSW graph is the same as KNN -- 
which finds the single best entry point for actual search in the last layer.
   
   In the last layer: starting with the entry node, all nodes having a score 
above a user-specified `traversalSimilarity` are traversed, and all traversed 
nodes having a score above `resultSimilarity` are collected as results.
   
   To protect against the adversarial case of the entry node lying outside 
`traversalSimilarity`, it has an additional clause that continues traversal 
until better scoring nodes are available (i.e. the search moves towards the 
vicinity of the query).
   
   However, this clause is susceptible to being caught in a local maximum, and 
search terminating before reaching near the query. Another hassle is the 
determination of `traversalSimilarity` for an ideal recall v/s latency tradeoff 
-- where some queries in sparse spaces need a larger buffer, which is 
unnecessary in denser spaces.
   
   To counter both of these: proposing to make the graph traversal similarity 
adaptive -- starting with a low value, and moving towards `resultSimilarity` 
with an exponential decay on encountering low-scoring nodes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to