msokolov opened a new pull request, #14226:
URL: https://github.com/apache/lucene/pull/14226

   ### Description
   
   This is a WIP patch to work out an idea for Knn hit collection that is 
deterministic and efficient in the sense that the number of hits collected per 
leaf scales with the size of the leaf. The idea is:
   
   1. Run pro-rated search assuming uniform random topK-document distribution 
among leaves (optimistically) by scaling k to kLeaf according to the proportion 
of documents in the leaf.
   3. Examine the list of per-leaf results. Any leaves whose minimum score 
`among` its kLeaf results is >= global minimum score (in top K across all 
leaves, merged) is submitted for further exploration using seeded search 
starting with the previous best results.
   4. repeat 2, 3 until all leaves min scores are worse than the global min 
score or other limiting conditions are reached.
   
   When the presumption of uniform distribution is valid, we would be able to 
skip steps 3 and 4 so we should get similar performance as we do with simple 
pro-rated algorithm. Otherwise we pay the cost of re-entering search, but we 
are guaranteed to get global (approximate) top K in a deterministic way 
(without cross-thread communication during search), and hopefully the cost is 
minimized by seeding with the best results so far.
   
   This is a patch only because the class design isn't really great.  It 
re-uses a bunch of stuff from SeededKnnVectorQuery. My thinking is a better way 
might be to make this be the default and merge it directly into 
AbstractKnnVectorQuery?  Or it could possilby be triggered by a 
KnnSearchStrategy?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to