[I] A new "Diversification" Type collector akin to FirstPassGroupingCollector [lucene]

via GitHub Tue, 16 Sep 2025 14:04:43 -0700


benwtrent opened a new issue, #15190:
URL: https://github.com/apache/lucene/issues/15190


   ### Description
   
   FirstPassGroupingCollector is pretty awesome, but we are fairly restricted 
on the things that we are actually grouping by. 
   
   Search is evolving and the desire to diversify results (to feed to an LLM, 
or even just to show users), is getting more and more important. 
   
   I don't have a fully concrete idea, but it seems to me that Lucene should be 
able to support a "grouping" by some statistics or requirements of another 
field.
   
   Two examples that come to mind:
   
    - Maximum marginal relevance
    - Diversification based on clusters of vectors (e.g. kmeans)
   
   Both of these will be complicated in their own ways because the groupings 
end up being dynamic as more data is seen (instead of having a natural static 
upper limit based on cardinality).
   
   But it seems generally useful for search.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] A new "Diversification" Type collector akin to FirstPassGroupingCollector [lucene]

Reply via email to