aruggero commented on PR #16034: URL: https://github.com/apache/lucene/pull/16034#issuecomment-4379929575
Hi @benwtrent, I worked on the sibling expansion topic you discussed via email with @alessandrobenedetti. From your conversation: > I wonder if we can cheat and when we find a nearest child, we simply gather and score ALL children of a parent ord, expecting them all to be near and bulk collecting and scoring them. This sort of dynamic exploration would allow min, max, and average score exploration (at some extra graph exploration cost). It might even make the baseline max score exploration faster. This will take some refactoring. I think if we made the KnnCollector interface keep track of the visited set, it could be done. It also unlocks things in Elastic search & Open search as we periodically want the nearest top paragraphs for each nearest parent doc. Sorry for the long description of this draft PR! I reported all the changes made for the implementation, and mostly the benchmark results. It seems that this approach can improve the precision/recall of the returned results, but not the overall time required for the computation. Let me know what you think about this :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
