msokolov opened a new pull request, #14226: URL: https://github.com/apache/lucene/pull/14226
### Description This is a WIP patch to work out an idea for Knn hit collection that is deterministic and efficient in the sense that the number of hits collected per leaf scales with the size of the leaf. The idea is: 1. Run pro-rated search assuming uniform random topK-document distribution among leaves (optimistically) by scaling k to kLeaf according to the proportion of documents in the leaf. 3. Examine the list of per-leaf results. Any leaves whose minimum score `among` its kLeaf results is >= global minimum score (in top K across all leaves, merged) is submitted for further exploration using seeded search starting with the previous best results. 4. repeat 2, 3 until all leaves min scores are worse than the global min score or other limiting conditions are reached. When the presumption of uniform distribution is valid, we would be able to skip steps 3 and 4 so we should get similar performance as we do with simple pro-rated algorithm. Otherwise we pay the cost of re-entering search, but we are guaranteed to get global (approximate) top K in a deterministic way (without cross-thread communication during search), and hopefully the cost is minimized by seeding with the best results so far. This is a patch only because the class design isn't really great. It re-uses a bunch of stuff from SeededKnnVectorQuery. My thinking is a better way might be to make this be the default and merge it directly into AbstractKnnVectorQuery? Or it could possilby be triggered by a KnnSearchStrategy? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org