mikemccand commented on issue #15136:
URL: https://github.com/apache/lucene/issues/15136#issuecomment-3233790261

   +1 to somehow bring some love to Lucene's grouping collectors, to catch up 
with all the nice optimizations that the default collectors have!
   
   I don't know enough specifics to figure out how we would do this though ... 
the fact that it is grouping, and will do 2nd pass once the top groups are 
identified from the 1st pass, makes it tricky?  Using the postings skip lists 
to jump to the next block that might be compelling (might alter the top 
collected groups) would be a weird mix of group values, and the best score 
(according to the group sort criteria) the query might achieve?  Actually, I 
guess we would skip just based on the bottom of the priority queue collecting 
the top groups?  That should be a doable opto.
   
   Say you index single family home prices, and you group by zip code, and your 
query wants top K groups when groups are sorted by `max(home_price)`, it seems 
like once you have collected K distinct zip codes, you could then look at the 
worst entry into your top K (the (K-1)th), and then skip to the next doc / 
doc-block that has homes more expensive than that worst entry's home prices?  
And if you are group-sorting instead by max(BM25 relevance), you'd seek instead 
by min required term freq for a hit to achieve the BM25 score of the worst 
group, maybe?  It's hard to think about ...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to