benwtrent opened a new issue, #15190:
URL: https://github.com/apache/lucene/issues/15190
### Description
FirstPassGroupingCollector is pretty awesome, but we are fairly restricted
on the things that we are actually grouping by.
Search is evolving and the desire to diversify results (to feed to an LLM,
or even just to show users), is getting more and more important.
I don't have a fully concrete idea, but it seems to me that Lucene should be
able to support a "grouping" by some statistics or requirements of another
field.
Two examples that come to mind:
- Maximum marginal relevance
- Diversification based on clusters of vectors (e.g. kmeans)
Both of these will be complicated in their own ways because the groupings
end up being dynamic as more data is seen (instead of having a natural static
upper limit based on cardinality).
But it seems generally useful for search.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]