mikemccand commented on issue #15136: URL: https://github.com/apache/lucene/issues/15136#issuecomment-3233790261
+1 to somehow bring some love to Lucene's grouping collectors, to catch up with all the nice optimizations that the default collectors have! I don't know enough specifics to figure out how we would do this though ... the fact that it is grouping, and will do 2nd pass once the top groups are identified from the 1st pass, makes it tricky? Using the postings skip lists to jump to the next block that might be compelling (might alter the top collected groups) would be a weird mix of group values, and the best score (according to the group sort criteria) the query might achieve? Actually, I guess we would skip just based on the bottom of the priority queue collecting the top groups? That should be a doable opto. Say you index single family home prices, and you group by zip code, and your query wants top K groups when groups are sorted by `max(home_price)`, it seems like once you have collected K distinct zip codes, you could then look at the worst entry into your top K (the (K-1)th), and then skip to the next doc / doc-block that has homes more expensive than that worst entry's home prices? And if you are group-sorting instead by max(BM25 relevance), you'd seek instead by min required term freq for a hit to achieve the BM25 score of the worst group, maybe? It's hard to think about ... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
