Re: [PR] DiversifyingChildren speedup - siblings expansion [lucene]

via GitHub Tue, 05 May 2026 06:54:56 -0700


aruggero commented on PR #16034:
URL: https://github.com/apache/lucene/pull/16034#issuecomment-4379929575


   Hi @benwtrent,
   I worked on the sibling expansion topic you discussed via email with 
@alessandrobenedetti.
   
   From your conversation:
   
   > I wonder if we can cheat and when we find a nearest child, we simply 
gather and score ALL children of a parent ord, expecting them all to be near 
and bulk collecting and scoring them. This sort of dynamic exploration would 
allow min, max, and average score exploration (at some extra graph exploration 
cost). It might even make the baseline max score exploration faster. This will 
take some refactoring. I think if we made the KnnCollector interface keep track 
of the visited set, it could be done. It also unlocks things in Elastic search 
& Open search as we periodically want the nearest top paragraphs for each 
nearest parent doc.
   
   Sorry for the long description of this draft PR!
   I reported all the changes made for the implementation, and mostly the 
benchmark results.
   It seems that this approach can improve the precision/recall of the returned 
results, but not the overall time required for the computation.
   
   Let me know what you think about this :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] DiversifyingChildren speedup - siblings expansion [lucene]

Reply via email to